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(1.) Introductory. 

In a series of memoirs presented to the Royal Society I have endeavoured to show- 
that the Gaussian-Laplace normal distribution is very far from being a general law of 
frequency distribution either for errors of observation* or for the distribution of 
deviations from type such as occur in organic populations, t It is quite true that the 

* "On Errors of Judgment, &c.," 'Phil. Trans.,' A, vol. 198, pp. 235-299. 
t "On Skew Variation, &c.," 'Phil. Trans.,' A, vol. 186, pp. 343-414. 
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4 PEOFESSOR K. PEARSON ON THE GENERAL THEORY OF 

normal distribution applies within certain fields with a remarkable degree of accuracy, 
notably in a whole series of anthropometric, particularly craniometric, observations.* 
In other fields it is not even approximately correct, for example in the distribution of 
barometric variations,! of grades of fertility and incidence of disease.^ For such 
cases I have introduced a series of skew frequency curves which serve the purpose of 
describing the frequency of innumerable skew distributions well within the errors of 
random sampling. An exact test for "goodness of fit" in the case of frequency 
distributions has also been now provided. § 

In deahng with frequency which diverges more or less conspicuously from the 
normal law we require to bear in mind at least three important points : — 

(i.) Any expression for frequency must be a graduation formula. It is not a 
disadvantage, but a fundamental requisite that it should smooth ofi" " Scheingipfeln," 
so far as these are irregularities within the limits of random sampling. 

Hence formulae like those provided by Thiele|| and Wundt's pupils.H which depend 
upon taking enough "moments" to reproduce the complete frequency, are a priori 
fallacious. Many interpolation formulae would do this completely, but such inter- 
polation formulae are not graduation formulae. 

(ii.) The graduation formula must not depend upon the calculation of constants 
having such a high probable error that their value is practically worthless. 

Now, the probable error of high moments and products increases rapidly with their 
dimensions ; hence there is, beyond the labour of arithmetic, a practical limit to the 
number of moments or products which can be efiectively used in a graduation 
formula. 

(iii.) There must be a systematic method of approaching frequency distributions, 
which can be applied to all cases with reasonably practical ease. 

Now the immense majority, if not the totality, of frequency distributions in homo- 
geneous material show, when the frequency is indefinitely increased, a tendency to 
give a smooth curve characterised by the following properties : — 

(i.) The frequency starts from zero, increases slowly or rapidly to a maximum, and 
then falls again to zero — probably at a quite different rate — as the character for which 
the frequency is measured is steadily increased. This is the almost universal 
unimodal distribution of the frequency of homogeneous series. Homogeneity may 

* ' Biometrika,' vol. I., p. 443; vol. II., p. 344; vol. HI., p. 230. 

t 'Phil. Trans.,' A, vol. 190, pp. 423-469. 

X 'PMl, Trans.,' A, vol. 192, pp. 257-330; 'The Chances of Death,' vol. I., pp. 69, et seq. ; 'Biometrika,' 
vol. I., p. 134 and p. 292; and for disease, 'Phil. Trans.,' A, vol. 186, pp. 390 and 407; A, vol. 197, 
p. 159. 

§ 'Phil. Mag.,' vol. 50, 1900, pp. 157-174, and 'Biometrika,' vol. L, pp. 154-163. 

II ' Forelaesninger over Almindelig lagttagelslaere,' Kjobenhavn, 1889; 'Theory of Observations,' 
London, 1903. 

U WUNDT, ' Philosophische Studien.' A whole series of papers, by G. F. Lipps and others, seems to me 
to quite miss the point of (i.) and (ii.) above. 
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for practical purposes be taken to imply unimodality, although the converse is very 
far from true. 

(ii.) In the next place there is generally contact of the frequency curve at the 
extremities of the range. These characteristics at once suggest the following form of 
frequency curve, if y8x measure the frequency falling between x and x-\-Sx : — 

<iv/d.=yJ§^ (i.). 

For in this case we have one mode only of the frequency, i.e., at x=—a, and 
dyjdx wUl vanish when y=0. 

But the assumption of this form, as long as F (a?) is general, is itself extremely 
general, and it includes cases in which dyjdx may not be zero, but take any values 
from to Qc , when y=z^.* 

Now let us assume that F (x) can be expanded by Maclaurin's theorem, and 

equals })f^-\-h^-\-h^ -\-h^ -\- .... Then our differential equation to the frequency 

will be 

1 ^ «-!-« n^s 

y dx \-^hyX-\-h^x^ -\-h^x^ -\- * 

There is now absolutely no difficulty in determining the unknown constants in 
terms of the moments of the system. Multiply up and also by a;", and then integrate 
throughout the range of frequency, we have 

\x''{bQ-\-h^x-\-b^x^-\-h^x^-\- . . .)-^ dx=\y{x-\-a)x"dx . . . (iii.). 

Or, noting that y=0, at the ends of the range we have, with the usual notation for a 
total frequency N, i.e., 

'H^tl^ = \yx''dx (iv.), 

the result by integration by parts 



Hence, if we write n=(), 1, 2, 3 ... s successively, we have s-\- 1 equations to find 
a, 6oj ^i> ^a • • • ^«-i ^^ terms of the moments. For example, if we stop at h^ we 
require two moments, at h^ three moments, at b^ four moments, at 63 six moments, at 
b^ eight moments, and at fe,_i, 5>2, 2s— 2 moments. 

* For example, cases in which there is a minimum frequency or antimode at a; = - a, and dyjdx infinite at 
one or two values for which y = 0, as in the frequency distributions discussed in ' Phil. Trans.,' A, vol. 186, 
pp. 364-5, and ' Roy. Soc. Proc.,' vol. 62, p. 287, " Cloudiness, a Novel Case of Frequency." 
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There is no difficulty whatever in finding the h's ; we have the system of equations ; 
where /a'o^I 

,jL'^a+ 2/160+ 3/2&1 + Wh + 5/^'4&3 + 6/5&,+ 
[I'^a + 3/260 + 4/361 + 5/462+ 6/563 + 7/564+ 
/^a + 4/360 + 5/461 + 6iJi'^b.2 + 7/663 + 8/764+ 



= -/i 

= -/4 



= — )^5 



(vi.). 



Hence, a, b^, b^, b^, 63, . . . are at once given in terms of the determinant A and 
its minors, where : 



A = 



H-'o' 


0, 


H-'o> 


2/1, 


3/2, 


4/.'3, . . . 


l^'v 


/o> 


2/1, 


3/2, 


4/3, 


5/4, . . . 


H-%, 


2/1, 


3/2, 


4/3. 


5/4, 


6/5, . . . 


f^'s> 


3/2, 


4/3, 


5/4>. 


6/5> 


7/6, . . . 


/*'*. 


4/^'3> 


5/4, 


6/5, 


7/6, 


8/„ . . . 





. . . (vii.). 

The results may be simplified slightly by taking the origin at the mean, and the 
moments about the mean, indicating this by dropping the dashes and putting /i = 0. 

Thus we have the following series of frequency curves, the origin being the 
mean : — 

(i.) Keeping 60 only 



ydx- '''^' 



(viii.). 



This is the Laplace-Gaussian normal form, 
(ii.) Keeping 60, 6] only 



This is the Type III. curve of my memoir on skew variation.* 
(iii.) Keeping 60, 61, b^ only 



(ix.). 



1 %. 



x-\- 



/^3(/^4 + 3/*2^) 



1 Ofi^H-i — 1 8/x.2^ — 1 2/X.3" 



y dx 



/^2 (4/^2^4- 3/^3'^) I /^3(/X4+3/ X2^) ,^ I 2/^2/^4— 3ms^ — 6^0^ 

in S in 9,T^ 1 /-v in S in ->. I TTi V^ o T^Z" 



(x.). 



10i^2/*4-18/.23-12/^32 10/*2/*4- 18/^3'- 12/^3' 10^2)^4-18/^2'- 12/^3' 



X'' 



'Phil. Trans.,' A, vol. 186, p. 373. 
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This equation gave Types I.-VI. of my two memoirs on skew variation,* and 
provides at once the expressions 

d = distance from mode to mean = "'.//^ ^?%'^^\ . ■ • • (xi), 

2(5/33—6^1 — 9) 

skewness_2^g^^_g^^_g^ (xu.j, 

where cr = v/*3, ^i = /^sV/^s^j I^z — /"'V/^a^j given in my memoir on the theory of errors 
of observation without proof f 

There is no theoretical limit, however, to this process; we can from (vi.) and (vii.) 
express the a and h's at once in terms of determinants, and expanding obtain forms 
which, Uke the formulae of Thiele, will fit closer and closer to the observed 
distribution of frequency, the more moments we take. But there are three fundamental 
practical objections to this. These are the following : — 

(a.) Experience shows that the form (x.) suffices for certainly the great bulk of 
frequency distributions, i.e., it describes them effectively within the limits of random 
sampling. 

If the distribution be even approximately normal, the series in the denominator 
converges very rapidly, for the coefficients of every power of x vanish for moments 
obeying the relationships : — 

H'Zs + l = 0, 11.2s = (2s— l)/A2/*2,_2, 

which hold for a normal series. 

(b.) The labour of arithmetic and of analysis becomes very great, if we desire to 
keep higher moments. If we go to 64 we should have to calculate the first eight 
moments of the observations about their centroid— a by no means easy task. Further, 
the classification of the resulting curves and the criteria for the right one to use in a 
special case, although not absolutely prohibitive, if we only go as far as 63, are for 
practical purposes idle in the case of taking into account 64,. 

(c.) The probable errors of the higher moments are so large that the values found 
for ju„7, /xg, &c., are quite untrustworthy, and even that for fig is doubtful, J unless we 
have frequency series far larger than usually occur in actual observations. This is a 
strong argument against the utility of any descriptions of frequency, such as those 
suggested by Thiele or Lipps, which depend upon moments higher than the fifth 
or sixth. 

-^ * 'Phil. Trans.,' A, vol. 186, pp. 343-414, and ' Phil. Trans.,' A, vol. 197, pp. 443-459. 
t 'Phil. Trans.,' A, vol. 198, p. 277. 

X In 'Phil. Trans.,' A, vol. 185, pp. 71-110, I have given a method of breaking up a frequency 
distribution into two normal series. I obtained long ago the criterion for determining whether such a 
resolution is possible or not. But it involves moments higher than the fifth, and the probable error of the 
criterion is thus so great that for practical purposes it is worthless. 
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The question of the probable deviations of the higher moments can be illustrated as 
follows, by finding the standard deviation of the moment when we take a number of 
random samples from a general population. Let 2^, be the standard deviation of /x.,, 
then IQQ^Jfis is the percentage variability of /a, due to random sampling. The table 
below shows the increase of these percentages in the case of the moments of normal 
distributions, which, quite as well as any other, will illustrate the rapid increase in 
probable error as we use higher and higher moments. The general values of the 
standard deviations of some of the moments were first given by Czuber,* then 
far more completely by Sheppard,! and a resume of all the results recently in 
' Biometrika.';]: 

Percentage Yariability in Moments due to Random Sampling when the Series 

is supposed to be Normal. 



Moment. 


500 in series. 


1000 in series. 


/*2 
/*8 


■ 

6-3 
14-6 
30-1 
60-6 


4-5 
10-3 
21-3 
42-9 



Precisely the same rapid increase takes place when we find the variabilities of the 
ratios ju.4//*/, i^s/fJi'^, f^s/f^i^f <^c., which are the forms in which the moments actually 
occur in our coefficients. In this case we have to remember that errors in the 
moments are correlated, but the correlations are given in the papers cited above. § I 
find in this case the following series, which is almost as suggestive as the previous 
table. 

Percentage Variabilities in Ratio of Moments due to Random Sampling, the 

Series being Normal. 



Ratio. 


500 in series. 


1000 in series. 




7-3 
23-3 
55-1 


5-2 
16-5 
390 



The order of this increase of percentage variability, and therefore of probable error, 
is the same for skew as for normal variation, and it seems therefore, with the length 

* 'Theorie der Beobachtungsfehler,' S. 130, d seq. 

t 'Phil. Trans.,' A, vol. 192, pp. 122, et seq. 

t Vol. II., pp. 273-281. 

§ Ibid., p. 277. 
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of the series in customary use, idle to use the 7"" or S"" moments ; these have 
variabihties varying from 30 to 60 per cent, of their values, and accordingly we might 
easily on a random sample reach a T"" or 8"" moment having half, or double the value 
it actually has in the general population. Constants based on these high moments 
will be practically idle. They may enable us to describe closely an individual random 
sample, but no safe argument can be drawn from this individual sample as to the 
general population at large, at any rate so far as the argument is based on the constants 
depending upon these high moments. 

It seems to me accordingly obvious that, bearing in mind the object of a theory of 
frequency (i.e., the description of the distribution in the general population by aid of 
a graduated sample, agreeing with the general population within the probable errors 
of random sampling), we can dismiss from practical use all theories which call upon 
us to use moments as high as the seventh or eighth. Any use of the general form 
(ii.) beyond 63, indirectly or directly, involves such higher moments. Personally I am 
inclined to doubt whether the continental series using higher moments are, from the 
standpoint of graduation, nearly as good as my form (ii. ). 

Hence we seem driven to the skew curves embraced in (x.) as a practical frequency 
series. If we have a frequency not described by (x.) we may, perhaps, use /aj and /^g,* 
but it is difficult to see how its description can possibly be bettered by the use of 
still higher moments. This may seem a counsel of despair ; but it is very far from 
being so in reality when we remember that (x. ) has proved its efficiency now — I might 
almost say, without exception — in a wide range of economic, physical, biometric, and 
actuarial data. 

In this memoir on skew correlation I shall accordingly confine my attention, for the 
most part, to constants the discovery of which does not involve the use of moments 
or products of higher than six dimensions, judging all above this limit to be, as a rule, 
disqualified for practical service by the magnitude of their probable errors. 

(2.) Generalised Idea of Correlation. 

Given any two variables or characters A and B, we say that they are correlated 
when, with different values x of A, we do not find the same value ^ of B equally likely 
to be associated. In other words, certain values of B are relatively more likely to 
occur with the value x than others. The distribution of B's associated with a given 
value cc of A is termed an a;-array of B's. If N pairs of A and B are taken, and n^ of 
these have the character A = x, these n^^ form the x-array of B's. This array, like any 
other frequency distribution, will have its mean, which we will denote by ^x, and its 

* Referring to equation (ii.), I propose to call curves which stop at bq skew curves of the 2"' order. 
Thus the normal curve is a skew curve of zero order; curve of Type III. is a skew curve of the P' order; 
Types I., II., v., and VI. are of the 2"" order. I hope shortly to publish a discussion of skew curves of the 
3"" order to complete the practically legitimate range of such curves. 

B 
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standard deviation, which we will denote by cr„^. The mean of all the B characters 
shall be y and their variability given by the standard deviation a-y. Similarly x, cr^ 
will denote the mean and standard deviation of the A's, and n^, Xy, and a-^ the 
number of individuals, the mean and the standard deviation for a ?/-array of A's. 

Now qlearly a knowledge of y^ and cTn, will not fix the B's which wUl be found 
associated with a given A, but it wiU define the limits of probable or even possible 
B's. The curve obtained by plotting y^ to x is termed the regression curve of y on x. 
A curve in which the ratio of cr„^ to the standard deviation a-y is plotted to x may be 
termed a scedastic* curve. Since the standard deviation is always a positive 
quantity, this curve always lies on one side of the axis ; it is a horizontal line in the 
case of normal correlation — i.e., the Gauss-Laplacian distribution of deviations — and 
coincides with the axis, in any case where correlation passes into causation, i.e., when 
one value of B only is associated with each A. 

The mean ordinate of this curve would clearly be a sort of general measure of the 
degree of correlation between A and B, but it seems for many reasons better to base 
our measure on the mean square of the weighted standard deviations of the arrays, or 

o-^2 = SKa-„/)/N (xiii.). 

a- a, will thus measure the average variability in B to be found associated with any A, 
its vanishing will mean that the scedastic curve as defined above will coincide with 

the axis. Now let a new quantity t], defined by 

/ 

0-^2^(1-772)0-/ (xiv.), 

be introduced. Then clearly 77 must lie between ±1, because a-a^ cannot be negative, 
being the sum of a number of positive squares. I term -q the coy-relation ratio, to 
distinguish it from the correlation coefficient represented by r. When 17=^1 the 
correlation is perfect or we have causation. Further we have by a well-known 
property of moments, if 

< = ^{n.{yn-yf}/^ (xv.), 

or 

^ = o-n,Ja-y (xvi.). 

This shows us that the correlation ratio is the ratio of the variability of the means 
of the x-arrays to the variability of B's in general. If 77 = 0, it follows that o-,^ is 
zero, or from (xv.) that every y„^=y, i.e., there is no association of B's with special 
A's at all, or correlation is zero. Thus the correlation ratio 77, as defined by either 
(xiv.) or (xvi.), is an excellent measure of the stringency of correlation, always lying 
numerically between the values and 1, which mark absolute independence and 

* I.e., a curve which measures the " scatter " in the arrays. 
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complete causation respectively. Further, remembering the definition ot r, the 
coefficient of correlation, i.e., 

= ^{n^(x-x){y„~p)] (xvii.), 

we have, from (xv.) and (xvii.), 



Now let 



N (ry^-r^) cr/ = S [n. (l/„ -y) {y„- y- ^^ {x-x))]^ 



Y=y+'^{x—x) (xviii.). 



then (xviii.), as is well known, gives the best fitting straight line to the series of 
points 2/„. loaded with their respective n^,. We can now write 

N (V-r^) cr/ = S{n. {y.-Yf} + S{n. {Y-y){y„-Y)]. 

But, using (xviii.), 

^{n^{Y-y){y„,^Y)] = T^^\n^{x-x){y„-y-'^{x-x))\, 

= 0. 

Thus the last summation vanishes, and we have 

N(o,2-r2)cr/ = S{w.(y„-Y)^} (xix.). 

The right-hand side must always be positive, unless y„^=Y, when it is zero. Hence 
we conclude that r) is always greater than r, or the correlation ratio greater than the 
correlation coefficient, except in the special case when the means of the ic-arrays of y's 
all fall on a straight line, i.e., we have linear regression, and then the two correlation 
constants are equal. 

Thus the expression (77®— r^) cr/ has an important physical meaning ; it is the mean 
square deviation of the regression curve from the straight line which fits this curve 
most closely.* We have now -freed our treatment of correlation from any condition 
as to linearity of the regression, and it remains to consider the probable errors of the 
various quantities dealt with. 

(3.) Probable Errors of Constants of Correlation. 

We shall first prove a number of general propositions relating to the probable 
errors of correlation constants. We first note that if n and n' be the frequencies in 

* The properties of the correlation ratio were briefly noted in a footnote to a paper by the author in 
' Roy. Soc. Proc.,' vol. 71, pp. 303-4. It has been systematically used in my laboratory for some years 
and determined longside r for many distributions. 

B 2 
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any two sub-groups of a total N, for which no member of n is a member of n', then 
the standard deviation of n due to random samphng is given by 

tj' = n[l-^j (xx.), 

and the correlation between deviations in n and n' due to random sampling is given 

ix„„iZi„Zt„, — ^^ (^xxi. ). 

Problem I. — To find the correlation in deviations due to random sampling between 

the number n^^ in the Xp-array ofy's and the number ny_ in the y^-array ofx's. 

If the symbol Sn denote the error or deviation in n, we have with an obvious 

subscript notation* 

hi,:=hi^^y^-\-hn^^y^-\-^n:,^y^-\^. . . + Sn^,j,, 

if there be q groups of y'&, and again 

Sn^,= 8n:j_j,.+Sn;^^3,.+Sw^^j,.+. . . + hn^.y_, 

if there be i groups of x's. 

Multiply the expressions for Sn^^ and Sn^,, together and we have 

Zn:,Zny = (Sn^^j,,)2+S (Sn^^y.Sji^.j,.), 

where the summation is for every pair of values of u and v, differing from s and p. 

Summing all such pairs of values for every random sample and dividing by the 
number of samples taken, we have the usual definition of correlation 

or, 

S„,^S^R%''», = ^^.y.-^^' (xxii.). 

This gives E.„^„^_, the required correlation, since S„^ and X„^ are known from (xx.). 

Problem II. — To find the correlation between deviations in the total n^^ of any array 

and in any sub-group n^^y^ of this array. 

We have at once 

8n^^Sn^^y = {8rL^^y_Y+S {8n^^y8n^^y^) 

where u is to be taken every value other than s in the summation term. Summing 
for all random samples and dividing by their number, we have, after using results 
like (xx.) and (xxi.), 

^'Vv,X^S^%v='^-^,y.(l-^) (xxiii.), 

which gives Il«.__„,^,_. 

* nxy = frequency of groups with characters x and y. 
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Proposition III. — There is no correlation between deviations in the mean oj an 
x-array y^^ and the total number in that a/rray. 

nx,Xy^=^{n^^y:yu), 

na:,8y^M^P= — 2/^P (Sw^,)HS {Sn^,Sn^,y:y«). 
Hence as before, using (xxiii.), &c., 

= 0, 
which proves that Ry^ „, is zero. 

Proposition IV. — There is no correlation between deviations in the mean of an 
x-array and in the total number in any other array. 

Proof as before. 

Proposition V. — There is no correlation between deviations in the mean of one 
x-array and in the mean of a second x-array. 

We have 

nxM^p=^ {^^,yjj<)—y^M^p^ 

^ V ^x,' = S {pn^^,y^„) — y^^, In^j. 

Multiply these two expressions together, sum for all random samples, and divide 
•by the number of such samples. We find 

+«/^,S {n^n^^,yjy„)/'N 
+«/vS'(«v'*:r,3,^»)/N 

— S' {na,,ynx,'yjy^yJ)l'^ 

i)X,})xJ -xq- n i/«p -NT ijXr' 

The last term is ^^"^y^^ ^^x^yx^' ^ ^^^ ^j^^g ^j^^ right-hand side is identically zero. It 

thus appears that there is no correlation between errors made in finding the means of 
two arrays. This result is not at once obvious, although a very little consideration 
shows it must be true. 
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Proposition VI. — To prove that the standard deviation of the mean y^^ of any 

x-array due to random sampling equals ~7=^- 
We have 

Square, sum for all random samples, and divide by the number of such samples. 
We have 

W=^>^^ (l - N ) -22/^,S |n.,,„(l - 1^) yj^ 



-28 1"^ 






+ S(/l 2\ S {nx^yJJ«) S (Wa;,y„'y«') 

2 



2 



Hence 



=n^,o-n. 



ty^=(T„J\/n^^ (xxiv.). 



Thus the probable error of the mean of an array has exactly the same form as the 
probable error of the mean of a random sample of a definite number of individuals. 
The array may have a variable number of individuals, but we have seen in 
Proposition III. that there is no correlation between errors in its mean and errors in 
the total number of individuals contained in it. 

Problem VII. — To find the probable error of the standard deviation of any array. 

By a precisely similar investigation to that of the previous proposition we find 



where 

This is identical with the probable error we should have if the array were a random 
sample of constant size. 

In many cases it will be sufficiently approximate to put 7/14= Sm^^ and we then 
have 

•67449 S,. =-67449-^!^ (xxvi.), 
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the well-known form for the probable error of the standard deviation of a normal 
distribution of a definite number of individuals. 

Problem VIII. — To find the standard deviation of the standard-deviation a-jx of the 
means of the arrays due to random sampling. 

Since 

the last term of which vanishes, since 

Ny=S {n^^y^^. 

Square the above relation, sum for all random samples, and divide by the number 
of such samples. • ■ 

We find 

4N W2.„^=S j«., (l -^) (2/. -# } 

-2S[^{y.-^f{y.,-§Y] 
+4S{S,,S,,R.^.^(2/.-y)n 

+4S {t„,^^,R„,.,,^ (2/v-y)' (y^-P)} 

+4S \t,^^^y^R,,^^,^ {y^c-y) (yv-y)] 

+^^{ty^n.;\y.-yf]. 

But Il„,j,, , Il„,,y. , and Ey.a,, vanish by Propositions III., IV., and V. Further, by 
VI., S« ^=o-„ V^x- Hence we have 



4NVm^SJ=S 



W:, 



^^A-^-^m-yf 



_2S 1^(2/. -#(?/.,-# 
+^^{n.,<r,,Hy.-yf}. 



Now let 



^X-^{n^^{y^-yy) 

be the n* moment of the means of the arrays about their mean. Then clearly 
\=zcr^^. Further, since S {n^/Tn, ^) = Ncr/ (1 — tj®), we can write 



16 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

where ^i is a purely numerical constant, which is equal to unity for those cases in 
which there is no correlation between the standard deviation of an array and the 
square of its mean's deviation from the mean. Thus finally we find 

^^■2^ K-y ^Ki^ /{l-v' ) (xxvii.). 

4NX2 ^ N ^ ^ 

This enables us at once to find the probable error of the standard deviation of the 
means of the arrays. 

Proposition IX. — To find the correlation between the deviations due to random 
sampling in the values of <Ty and ctm. 

We have 

Ncr/=S{n,(2/-y)^}, 

2No-^8£ry=S{Sny. {y,-yf} -2 hy S{n^. {y-y)] ; 

the last term vanishes because S (ny,y,)=l^y. 
Thus 

2Na-3,Sa-j,=S{Sny. {ys—yf}- 

But from the previous proposition 

2No-MSo-M=S{Sra^^(2/^^-j/)2]+2S[S(/^n,^(2/^,-y)]. 

Multiply these two expressions together, sum for all random samples and divide by 
the number of such samples ; we find 

+2S{i^,2„,Sy, R„„,y,^ {y,-yy {y.-y)\. 

To evaluate this, we require to find the two correlations expressed by E,;^„, and 
^n,y, ■ We will consider the two summation terms separately. 

First Term. hn^=^n^^y^-\-hn^^y^-\- . . . -\-hn^,y.-\- . . . 
hny=hny^^ + S*iy,^,+ • • • +8%x,+ • • • 
^n^,^ny={hna:^yf-\-^ {8n^^y,hn^^,y), 

where in the summation p' and s' are not equal to p and s. 
Proceeding in the usual manner we find 



S.,S„,R„.,„,=n.,,.(l-!|^--]-S| 



'^^Xfyjnjc^iy^ 



N 
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where in the first sum s' is to take all possible values, and in the second "p' is to take 
all possible values. Thus we have 

S„.^S„,R,,^„,=n.,, - !^ (xxviii.). 

Substituting we find 

First Term = ^Ar^.AV'-yfky^-yJ'i 



-^.\^-f'{y.-yf{y^-yf 



Here both the summations are really double summations ; fixing our attention on 
any Xp, i.e., on any array of ^/'s for a given value of x, we have first to sum for all t/'s 
in this array, and then we have to sum for all arrays. This is the meaning of S^. In 
Sg we are to associate every array of cc's with every array of y^s ; hence this term wiU 
break up at once into two factors, i.e., 

=Ntr/Xo-Ml 

Keeping Xj, constant first in Sj, we see that 

S{n.,y. {y. -yf) 

is the 2"* moment of the y's in the Xp array about the mean of the system 

=n,A^„^^+{y^-yY}- 

Combining we have 

First Term = S{n., (i/.,-#} +S{n.,cr„./ {v^-yf} -Nct/ctm^ 

= N{\4+cr/o-„3(l_,,2)Xi-cr/o-M^} ...... (xxix.). 

We now turn to the second term which involves the discovery of Il«, y, • 

Hence 

Sum for all random samples and divide by the number of such samples ; we have 

= «*py.{y*~y^,) (xxx.). 

c 



Missing Page 
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In any other case, Xa. Xi~l> (/^4"~3/^2^)/M2^> {K~^\^)/K^ ^ill probably be small and 
thus 

Probable error of 

■»? = -67449 (l—Ty^Vv/N, nearly (xxxiv.). 

This simple form suffices for many practical cases. 

If greater exactitude is wanted, there is, however, no great labour in using 
(xxxiii.). We find the means and standard deviations of each array. 

Then NXg and NA.4, are the 2"* and 4"" moments of the means of these arrays 
about their mean. 

N/Lig and Nju,4 are the 2"'^ and 4*'' moments about the mean of the ^/-characters, and 
will always be known for skeiv variation. 

Xi is defined by 

N,t/(1-7,^)o-m^ ^ ^' 

and can be easily found when the means and standard deviations of each array have 
been found. 

The most troublesome expression is Xz defined by 

But as we do not take usually more than 1 to 20 arrays, the discovery of their 
3'''^ moments is not an extremely difficult task. As a rule, however, ^2 is very small 
and may be fairly neglected, even when we must find Xi~l- ^^ these points will 
be dealt with in the numerical illustrations given later in this paper. At present 
we note that the probable error of t] has been determined, and that its value for the 
general case is not really more complex than the value of the probable error of r in 
the general case, which requires the determination of product moments of the 4*'' 
order.* 

* Let Npjg = S {nxy (% - x)i (y - yf}, then the probable eri'or of r is given by 

y , f l[ P22 - 3^11^ P22 - 3^20^02 j?40 - 3^20^ Poi - 3j?02^ ^31 - ^PwPlO P\i - ^PuPdi \ , 

"■"NI pn^ + 22)20^^02 ^ ip-ii? ^ W i'iii'20 i'11^02 ' r • (^^^^"•)- 

This j,grees with the value given by Sheppard ('Phil. Trans.,' A, vol. 192, p. 128), except that the r'^ 
factor has been dropped by a printer's error in his paper. For the special case of a normal distribution, we 
have easily from the equation to the normal surface 

Pm = ^Pi^^, Pm = ^Po2^ P3\ = ^PuP2<>, Pi3 = ^P\\Pm, {p22-Spn^)jpu^ = {l-r^)lr^ 
and 

the well-known form (' Phil. Trans.,' A, vol. 191, p. 245). 
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(4.) On the Higher Types of Regression. 

• We have already seen how the introduction of the correlation ratio t) enables us to 
drop the limitations associated with the Gauss-Laplacian form of frequency, and the 
Bravais correlation formulae. The fundamental step towards this advance was 
undoubtedly taken by G. U. Yule in his paper in the 'Roy. Soc. Proc.,' vol. 60, 
pp. 477 et seq., wherein he shows that if the regression be linear, the Bravais type of 
formula applied to multiple correlation is still true, although we make no assumption 
as to the form of the frequency surface. It would undoubtedly be a gain to have 
skew frequency surfaces which would describe skew correlation for the great mass of 
cases as eifectivly as the series of skew frequency curves describe skew variation, but 
although a considerable amount of progress has been made in the consideration of 
these surfaces, their full theory has not yet been worked out owing to difficulties 
of analysis, and their complete discussion must still be postponed. Yule's method 
of approaching the problem from the form of the regression curves is, however, 
available and capable of very great extension. Its chief advantage is that it 
makes little or no assumption as to the distribution of frequency ; its chief defect 
lies even in this advantage of generality : it does not enable us to predict the 
probability of an individual with a given combination of characters. This follows at 
once from the fact that we make no assumption as to the form of the distribution 
within an array. Without some theory as to variation within the array, we are 
reduced to the laborious process of calculating the standard deviation, skewness, and 
other general characters of each array, a lengthy and troublesome process compared 
with a theory which would, like the Bravais theory, give these at once in terms of a 
few constants determined from the data as a whole. 

In the great bulk of biometrical and economical enquiries, however, the regression 
does not diverge very markedly from the linear form. In the cases of non- linear 
regression that I have hitherto had to deal with, I find that parabolas of the 2"* 
or 3"* order will suffice as a rule to describe the deviation from linearity. If 
they did not, we could, of course, use curves of higher orders, but the difficulty 
referred to in the first section of this paper at once arises : we then need to use 
in the determination moments and product-moments of such high orders that the 
probable errors of the constants are so high as to render valueless their calculation 
from such statistical data as we can hope for in most actual inquiries. In th,-; great 
bulk of investigations it is practically impossible to increase our random samples 
from 500 to 1,000 individuals up to 50,000 to 100,000. Nor in the great 
bulk of statistical cases is any such increase even desirable, for a fairly wide 
experience shows that 2"* and .S'** order parabolae amply suffice to describe the 
skewness of the regression line. I shall accordingly classify skew correlation in the 
following manner : — 
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{a. ) Linear Regression : 

The mean of an x-arraj of y's, i.e., y^^, is given by 

ya:,=ci'o+<^r^P (xxxviii.). 

(&.) Parabolic^ Regression : 

The mean of an a;-array of y's, i.e., y^^, is given by 

Vx^aQ-^-aiXp-^-a^Xp^ (xxxix.). 

{c.) Cubical* Regression : 

The mean of an a^-array of ^s, i.e., y^^, is given by 

ya:=ao+<^iXp+a^Xp^+a^Xp^ (xL). 

It is conceivable— in fact, from unpublished work already done, highly probable — 
that the theory of skew variation will give regression curves, not of the exact form 
involved in (xxxix.) or (xl.), but containing product terms in x and y. The most 
general equation to a regression curve may be taken to be of the type 

and what experience shows us is : that for the great bulk of vital phenomena it is 
sufficient to expand by Maclaurin's theorem and keep the first three or four terms. 
Indeed, in the large majority of cases, (xxxviii.) alone suffices. Hence, if (xxxix.) 
or (xl.) fit the data within the limits of random sampling, we are not injudiciously 
circumscribing future developments of the theory of skew correlation by casting our 
regression curves into the above forms. I shall deal first with the theory of cubical 
regression, for we can then obtain from this the conditions necessary for parabolic 
and linear regressions. 

I must remind the reader, however, that the form of the regression line does not in 
any way limit the nature of the distribution of the array about its mean ; the 
variability of an array, i.e., the standard deviation of an array, having for its mean 
value Oyv/l — rf', may or may not be the same for all arrays. If it is the same, or all 
arrays are equally scattered about their means, I shall speak of the system as a 
homoscedastic system, otherwise it is a heteroscedastic system. The Gauss-Laplacian 
correlation surface gives a homoscedastic linear system. Mr. Yule's linear regression 
is not necessarily homoscedastic ; it may, however, be homoscedastic without being 
normal, and then the scatter of each array is measured by a-yy/l—r^. When a 
system is homoscedastic, but not linear, then cr„^^=(r^^(l— ly^), and consequently the 
Xl of (xxxv.) is equal to unity. Xi — •'^ ^^ ^ necessary result of homoscedasticity. 

Lastly, we want a word to express the idea of all the arrays having equal skewness, 

* ' Parabolic ' and ' cubical ' are here used in the narrower sense of regression curves corresponding to • 
ordinary parabolse of the 2"* order and of the 3'* order respectively : in both cases the axis of the 
parabola being parallel to the axis of the ^/-character. 
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or being asymmetrical in an equal degree about their means. I shall express this by 
the term h omocliti c ; generally the arrays will not be equally asymmetrical round their 
means, and in this case we shall speak of them as h eterocliti c. If there were no 
skewness in any of the arrays, then m^ of (xxxvi.) would be zero for all of them. 
I term arrays of no skewness isocurtic, and skew arrays allocurtic. If we supposed 
that a curve of Type III. would sufficiently express the skewness of an array, we 

should have 

Sk.=^t3/(^„,__^ 
and therefore from (xxxvi.) 

_ 2S{n.,cr,„/(Sk.)(y.-y)} 



For a homoscedastic system we have a;,, ^a-ys/l—rf', and therefore 

2SK(Sk)(^V^} 

and for a homoclitic system 

_ 2(Sk.)S{n.,or,,/(y.-^)} 

For a homoclitic homoscedastic system, whether isocurtic or allocurtic, 

2(Sk.)S{n.^(y.-^)} _, 

Thus x% is to a certain extent a measure of both homoscedasticity and homoclisy. 
But as the correlation between o-^ and y:r,—y is in most cases extremely small, while 
the skewness of the array can well change its sign with arrays above or below the 
mean, we can fairly consider the smallness of ^3 to be a measure of the approach to 
homoclisy. I am thus inclined to speak of Xi — 1 and ^3 as measures of heteroscedasticity 
and heteroclisy. When they both vanish we have a homoscedastic homoclitic system. 
For such systems 77, the correlation ratio, tells us effectively the scatter of any array, 
and as a rule all we want to know, in addition, is the form of the regression line. 

(5.) Cubical Regression. 

We have already used the following notation 

%,,,=S{w,„(a;-*)?(2/-yX} (xlii.). 

We shall shorten our formulae if we write 

'r=Pul{o-x<ry), ^=Pi\/{<T^^<ry), X='P%A^^^y\ ^=PJ{.o-*o-y) • (xliii.). 

We have already used /x^ to denote p^, and we shall use v^ for p^^. Further, we 

write 

^l = ''3V''•3^ ^z=vjvi, fis=^5^s/^i\ ^i=vjv^^ . . . . (xliv.). 
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■v//3i = i'3/o-/ will be of the same sign as v^. These constants /S have been previously 
used in the theory of skew variation.* 
We shall further put 

i=e-r^/J„ Z=C-r^„ d=e-r^J^/J^ .... (xlv.). 

The regularity of the forms e, ^, 6, is rather screened by the above notation, which 
is introduced for brevity ; using the pgq notation, we have 

g_BiBo-BiBo, ^—PuPm::z2rLP^, ff- PuPzQ — PnPio _ _ (xlvi.), 
a-Ja-y (T.J'cry o-^o-y 

whence the law of formation of these constants is easily seen. 
The regression curve may now be conveniently put into the form 



Vj-d ^l^j^l ^p-^ .j^l^ ( '^^"^ Y+^g i^P ^ ) .... (xlvii.). 



Or, multiplying by //,.„ and summing for all arrays, 

the sign of v//3j being always that of the 3'* moment. Hence, measuring from 
the means of the two characters, i.e., ^j,-=Xp—x, Yj-^=yj-^ — y, we may re-write (xlvii.) 

Now multiply by n,,c^p/a-j; and sum for all arrays, remembering that 

Nrcr.,cr, = S(n.XY) = S(n.X^Y.,), 
we find 

This enables us to get rid of 6^ and write (xlviii.) 

+ h,{{X,/cT:f-^,{X,/a:)-s/J,] . . . (xlix). 
Now multiply by nj^^i^plcr.rY' and sum for all arrays. We have 

^=r^J, + h,{^,-P,-\) + h,{li,l^/J,-^,-yJ,-^J,), 
or 

e = b2<f>2+hi>s (!•). 

where 



<^3 = (^3-^A-^i)/n/^1. 

* 'Phil. Trans.,' A, vol. 186, p. 368, and A, vol. 198, p. 278. 



(li.). 
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9z 
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+ ' 



63[(XVo-..)3-;S,(X,/cr.)-v/;8,-|[(XVcr.„)^-v/A(X,/cr^)-l]] • (Hi-)- 



Now multiply by n^^ (X^/cr^)^ and sum for all arrays ; we find 



92 



or 

where 

It follows from (1.) that 



(4<^2-e<^3)/(<^a«^4-<^32) = &3 (liii.), 

^,=^,-^i-P, (liv.). 

b,=i^<f>,-C<f>,)/{Mi-<t>s') (iv.)- 

We can thus write the cubic regression curve in either of the forms* 



* The method is perfectly easy of extension, if we choose to use higher products and moments, to a 
regression curve of any order, e.g., 

Y^J<Ty = bo + bi (Xp/a-^) + h (^jo-xY + . . . + S„ (X^/o-a,)" + . . . 

For let: ^^qi = B{n^Y^^Xp9)/{,T^ga-y), and y, = •',/<r^« = S («^X/)/(No-/), 

we have: 0= Jo + x 6i + 62 + 7363 + . . . + yj>n + 



€11 = X Jo + 61 + 7362 + 74*3 + 
«2i= h + 7361 + 7462 + 75*3 + 



tpi= yph +yp+\h + yp+i>i + yp+th + 



+ 7«+l*n + 
+ 7n+2*n + 



+ 7n+p*n + 



Hence writing epi for 0. 70= 1, 71 = 0, 72= 1, we have 
where A 



K = («01 Aon + «11 Ai„ 


+ ejl A2n + • 


. + tpi Apn + . . 


•)/A. 


70, 71. 


72. 


73. • 


7«. 




71. 72, 


73. 


74, • 


7n+l. 




72, 73, 


74, 


75. • 


7«+2- 




7j.. 7p+i. 


7^+2, 


yp+t, ■ 


yp+nt 





and Agn is the minor of the constituent in the (ff+l)'" row and (w+l)'" column. As we have already 
noted, however, solutions involving anything beyond 75 are hardly likely to be of practical value. 

The value above for ft,, is the type equation given by the method of least squares, when we strike the 
best fitting curve to all the entries in the correlation table. I have already pointed out that the method 
of moments becomes identical with that of least squares, when we fit parabolse of any order (' Biometrika,' 
vol. I., p. 271). The retention of the method of moments, however, enables us, without abrupt change of 
method, to introduce the needful 1;, and to grasp at once the application of the proper Sheppard's correc- 
tions. The extension of the method of least squares to continua in space has not yet, as far as I am aware, 

been fully considered. 

D 
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YJa-,=r(X,/^,)+-if(XVo-.,)2-y^(V(r.,)-l} 



or 



YJo;=r (X>.)+ j%-% {(X,M)^- x/A i^/cr.)-!} 

9294 — 93 
9294 — 93 

The former arrangement of the solution, while it is apparently more cumbersome, 
is, perhaps, the better, for it gives us at once the measure of the deviation from 

parabolic or 2""* order regression, i.e., the approach of ^c^^- ^*^3 ^^ zero. In the case 

of normal correlation both e and £, vanish, and neglecting higher terms the condition 

for linear regression is that e = 0, and ^(^3— e<^3 = 0, or, again, e and ^=0. For 
material in which the a;-variability is isocurtic, ^^=^^=^^ = 0, and the regression 
curve takes the simple form 

Yja-,=r(X^/cr.)+i-{(XVo-.)^-l} + |{(X,/o-.)«-^,(X,/o-.)} . (Ivi.) ter. 

92 94 

We now turn to express these relations in terms of the correlation ratio rj. 
Multiply (Ivi.) by n^^J^^Ja-y, and sum for aU arrays, we obtain 

^2,,^2+ |(,_y^,.)^ p2-^j ^s h-^,r- ^ (e-^M], 
92 9294—93 I- 92 J 

whence results 

(Ivii.) is a necessary condition of cubical regression. 

It is of course not a sufficient condition, as we ought to show that h^, 65, &c., all 
vanish, and thus any number of conditions may be found. For example, multiply by 
n^lLp^laJ' and sum for all arrays, then 

9294—93 9294—98 V/Sj 

is also a necessary condition. Here ^^=v,jvj(rj-^. But the high as well as complicated 
value of the probable errors of such expressions renders it idle to consider them in 
practice. 
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Substituting (Ivii.) in (Ivi.) we have : 

Y J<r,=r (X,/o-.)+^{xV«^.)^- V^^(x>.) - 1 } 

Which sign is to be given to the root will often be visible on inspection of the 
observations. Otherwise the sign of the root must be the same as that of 

^^a— #3- 
(lix.) will save the calculation of ^if the root-sign can be found by inspection. 
Finally there is a third form into which we may put the cubic. Eliminate ^%'^i—'i>z 
from (lix.) by aid of (Ivii.) and it becomes 

YJa-,=r (V<-.)+^^^S-^^^ {(XV<r.)^-v/A (V«^.)-l} 

+ '^' %~-ii/ {(X./a-.)«-/8, (X,/cr.)- v/^} . . (Ix.). 

At first sight this might appear to be the best form of the cubic, because it does 
not involve the 6*^ moment of the variable x. But this is very far from being the 
case in actual practice. The reason is simply this, e, ^ and yf—r^ are in most cases 
very small — they vanish in normal correlation — relatively to ^^ and ^4. Hence both 
numerators and denominators of the coefficients of the square and cubic terms are 
the ratio of small quantities, and accordingly subject to large probable errors. For 
this reason (Ix.) was found in actual practice to be of no service. Of the other two 
forms (Ivii.) and (lix.), which neither suffer from this defect, <^2<^4,— ^3^ being always 
large relative to the numerators, (lix.) while involving a 6"* moment does not 
involve a 4*'' product, t„ and experience shows that the former is on the whole 
easier to determine and more exact than the former. Hence (lix.) seems the prefer- 
able form, even if it be needful in certain cases to determine X in order to fix the 
sign of the radical. The cubic regression curve thus demands a knowledge of the 
correlation ratio -q, of the " cubic product " e and the sign by inspection or calculation 
of Z<l>2~^i^3- Besides this, we require the first six moments of the independent 
variable x. Of course if the regression of a; on ?/ be required, as well as that of 
y on X, the second correlation ratio and cubic product as well as the first six moments 
of y must be found. It is rare, however, that both regression curves are needed for 
a single enquiry. 

As to the general form of (lix.), we note that there will always be a real point of 
inflexion given by 

^/o-^=h Ms-^)/Mi) (Ixi.), 

D 2 
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where 

and further that there may be two points of horizon tality given by a certain quadratic. 
Thus, in general, the regression hne will tend to be part of an S-shaped curve. The 
horizontal points may be imaginary, or, if real, either they or the point of inflexion 
may be far beyond the portion of the curve which crosses the observed field of 
frequency. If we consider, however, the slope of the regression curve to measure 
the regression in the neighbourhood of any point, we note that the regression is a 
maximum at the point given by (Ixi.), and grows smaller and smaller towards the two 
points of horizontality, i.e., points of complete local independence of the two 
characters. These are not unfamiliar features in certain practical cases of skew 
correlation,* and accordingly the cubic regression curve provides us with a ready 
means of describing regression phenomena, which cannot be dealt with by the simple 
line or the parabola. 

It may of course be suggested that a quartic or quintic curve would give a 
better result than a cubic. The answer to this is : Possibly, but the high moments 
and products required render it impossible to deal even superficially with the probable 
errors of the constants involved. The calculation of the probable error of 7^ is a 
sufficiently stiff task in the general case. To test the probable error of a condition 
like (Ivii.), to say nothing of one like (Iviii.), would involve an immense amount of 
work, since we should want the correlation of errors in y], I, l,, and 6. Speaking with 
some experience of practical statistical possibilities, I think, the tendency to use very 
high moments or product-moments must be curtailed to the minimum of actual needs. 
We cannot deny the existence of skew vaiiation, nor of the sensible curvature of 
regression lines. We must admit their existence as the result of statistical experience. 
This existence involves a great widening of the old frequency notions and the need 
for a new means of description. But we must remember that statistics are essentially 
a practical study, the art of describing by a few numerical constants observational 
experience, and we must curtail at every turn the desire to run riot in mathematical 
formulae, which cannot be generally applied in actual practice, t Still I propose later 
in this paper to deal with the general formulae for quartic regression. 

(6.) Parabolic Regression. 

For a parabolic system 63 must vanish, or nearly vanish. Hence we have from 
(liii.) and (Ivii.). 

C<l).2—i(f>s=0 (Ixii.), 

<l>Av^-r"^)-^^=0 (Ixiii.). 

* Compare for example the regression line of age of mean age of bridegroom for actual age of bride, 
which gives a typical S-shaped curve. See ' Biometrika,' vol. II., p. 20. 
t These remarks have special reference to the points dealt with on p. 6. 
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From these conditions we find 



These give for the form of the parabolic regression curve 

Yj<T,=r{X^/cr,)+l{{X,/<T.Y-^/J,{X,l<r,)-l} . . . (Ixiv.), 
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or 



Y,>,=r(YVo-.)± V'5^{(X>,)-^-v/i8i(XVo-.)-n • • (Ixv.). 

The latter form, besides the correlation coefficient and correlation ratio, requires only 
a knowledge of the skew variation constants ^j and /Sg, and is therefore very easy to 
determine. Except for very nearly linear regression, there can be no doubt as to the 
sign of s/yf'—'r^, as we can tell at once whether the parabola ought to be concave or 
convex to the a;-axis. In other cases the sign of y/rf—r'' must be taken to coincide 
with that of e, which must therefore be found. It will then be as easy to use (Ixiv.) 
as (Ixv.), although probably i) and r can be found with less error than e. 

It is thus quite easy to allow for such curvature of the regression line as can be 
expressed by a parabola of the 2"* order of the type considered. 

We notice at once that the regression curve does not pass through the mean of the 
two characters. Or, an individual with the mean of one character will most probably 
not have the mean of a second character. This is a rather important result, which 
follows at once for nearly all types of skew correlation. 

It will be seen, for example, that Quetblbt's " mean man," defended by Professor 
Edgewoeth as theoretically justifiable, depends entirely on human characters giving 
linear regression curves. Such linear curves are certainly given by many pairs of 
characters, e.g., cranial and body measurements, but there are certainly other 
characters for which regression ceases to be sensibly linear, and the conception of the 
" mean man " in this case fails. For example, if age be considered as a character, 
then the regression is certainly not linear, and the individual of mean age will not 
necessarily have either the mean physical or psychical characters. This seems of 
some importance for the general conception of " type," if by type we denote the mean, 
for probably there are other characters than age for which regression is skew. 

The regression, i.e., dY:cJdXf will be zero, for a point ^(jmn.) for which 

%he sign of the root being determined as before. Clearly, therefore, unless r be very 
small, or t)^ diverges very sensibly from 7-^, this point of zero regression may correspond 
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to a very large abscissa, and in some cases will lie entirely outside the range of 
observable frequency. 

The parabola of regression cuts the line of regression, i.e., the line of best fit to 
the series of regression points, or to the means of the cc-arrays, in two points 
determined by the quadratic equation 

or 



O-j 



=i{v/'i8i±v/A+4} (Ixvii.). 



These points are always real, and correspond, if regression be truly parabolic, to 
the same values of the x-character, whatever be the ^/-character of which we are 
considering the correlation. In the case of normal variation of the x-character 
only, these are the points of inflexion of the as-distribution. 

(7.) Linear Regression. 

In this case it is necessary that both h^ and 63 vanish within the limits of random 
sampling, and, although these are not theoretically sufficient — for a whole series of 
relations between the higher product-moments could be written down* — they are for 
practical purposes sufficient. 

Hence we have the following conditions for linear regression : — 

r)'^=r-^ (Ixviii.), 

or, the coefficient of correlation, without regard to sign, should be equal to the 
correlation ratio. Further e should be zero, or 

PiiP-2o-2hiPso=^ (Ixix.). 

The theory of linear i^egression is so familiar that it need not be further discussed 
here. In the actual practice of statistics, the determination of the means of the 
a;-arrays and the drawing of the regression line will often suffice to show the fairly 
trained eye whether the deviations from it are random or not. If they are not 
random, then we must proceed to the determination of r] and of the higher product- 
moments. 

The following are numerical examples of skew correlation, selected to illustrate the 
theory developed above. 

* For example, it is necessary in most cases that I should vanish. In the instance of that very special 
case of linear regression, the Gauss-Laplacian normal frequency, it is easy to show that the constants €, ( 
both vanish as well as t)^ = r^- 
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Statistical Illusteations. 

(8.) Illustration A. — On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Spray in the case of Asperula odorata. 

In this case the material was collected in a lane near Horsham, Sussex, at 
Whitsuntide, 1903, by Miss M. Eadpord. There were 150 independent sprays, the 
woodruff had just flowered, and the whorls were counted from the flower downwards. 
Being early in the season, the maximum number of whorls was five, and, in some 
cases, not even as many were available. The material was counted and tabled by 
the author, and the results are exhibited in the table below : — 





Table I.- 


-Correlation of Whorl- Branches and Position ot Whorl. 






X. 


Whorl. 


Number of branches in whorl. 


np. 


y^- 


<^«p- 


wis. 


Ms. 


4. 


5. 


6. 


7. 


8. 


it 


X2 
Xi 


First . . 
Second . 
Third. . 
Fourth . 
Fifth . . 


1 
1 


3 

3 

6 

12 

13 


66 
61 
60 
68 
53 


42 
47 
40 
39 
10 


39 
39 
44 
22 
10 


150 
150 
150 
142 

87 


6-7800 
6-8133 
6-8133 
6-4859 
6-1724 


•8553 
-8437 
-9047 
•8780 
-8605 


•7316 
-7117 
-8185 
•7709 
•7404 


•1535 
•0985 
•0383 
•1347 
•4049 


Totals. . . . 


2 


37 


308 


178 


154 


679 


6-6554 


— 


— 


— 



We require the regression curve giving the probable number of branches for a 
given whorl. 

Dealing first with the skew variation in position, a purely arbitrary system 
depending solely on the number of whorls dealt with in each position, we find, not 
using Sheppard's correction,* 



Mean = 2-802,651, 
0-^=1 -336,887, 



Hence we determine 



i8,=3 



^2=1787,268, 
1^3= -311,783, 
1/4=5-841,682. 

017,027, «^2= 

828,767, «^3= 

085,545, <j>^= 



1/5= 2-799,638, 
j/g=22-678,308. 

•811,740, 
-286,465. 
-610,879, 



•972,295, and ^^1 = + -130,487. 

* The numbers are tabulated to six places, because we cannot be sure that the final calculations are for 
the data true to two places, which is all we finally retain unless this is done. Any number of figures can 
really be retained with perfect ease when the work is done on a calculator. 
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We now turn to the skew variation in the number of branches to the whorl, and 
get the following constants : — 

Mean=6-655,375, /i2= -806,124, 

cTy= -897,842, )u,3= -132,090, 

)Li4= 1-138,410. 

The values of y^^, m^, and wig are given in table above. Using them we find 

o-M= -224,377, >; = -249,911, (Ta=iTy\/\-^^ = -869,355, 

X2=V= -050,345, X^= -007,474, xi = "990.862, X2=- '059,851. 

These give by (xxxiii.), showing the numerical contribution of each term, 
S,^=:^ {-878,991 -•010,323--000,888--007,231 + -013,578}, 

or the probable error of ■»; = -0242. 

Had we calculated the probable error of ■>/ from (xxxiv.), we should have found for 
its value -0243. It is clear that for this special case the simple formula (xxxiv.) is 
amply sufiicient, the small terms almost cancelling. 

We see that ^i is almost unity, and the graph of a-„J(Ty shows indeed that the system 
is sensibly homoscedastic. Xi, '^^ small, but a glance at the graph of the clitic curve 
on Diagram I. shows that we can hardly treat the system as homoclitic, the changes 
in the skewness forming a fairly uniform curve.* 

For practical purposes, we may treat the variability of the number of branches in 
any array as sufiiciently closely given by cr^ v/l — rf. 

We now turn to the product-momentst and find 

Pji = — '249,160, P3i=— -896,415, 
P2j=_ -236,289, jp^^ = — 1-210,225. 

* Throughout these illustrations the clitic curve is plotted by calculating the skewness of the arrays 
from ^maKmiY'^. See p. 23. 

t In calculating these products referred to the centroid from those referred to any axes, generally 
corresponding to whole numbers in the table, the following reduction formulae will be found useful 
We take Nn^j- = S {n^y x'^y'^'), x' and y' being measured from any axes, further, x, y' are the distances of the 
means from these axes, and V2, va, V4 the moments of the x-character about its mean as tabled above. 

Pn = Hn - x'Uoi, fn = riji - ^xH-a + ai'^IIoi - y'v^, 

i'si = Hsi - Sic'nai + Mm-a - S'^noi - y'vs, 

Pii = n4i - 4a;'n3i + 6,T 2II21 + ixm^ + ui'^lloi - yvi. 

The ^'s should be further corrected for grouping by Sheppard's corrections (given on my p. 36), provided 
there be high contact at the contour of the surface of frequency. Sheppard's corrections have not in this 
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These lead to 

r=--207,579, i=--120,164, ^=--088,241, ^=--285,890. 

Thus all the constants are determined. 
We find 

7,2-r3= -019,867, 

.^2 (7;2_^2)_g2_. 001,281, 

Mv'-^)-'^'-a<f>^-^<}>sY/{M.-^.')=-ooo,276. 

These should be respectively zero for linear, parabolic, and cubical regressions. It 
will be seen that they are satisfied with increasing closeness ; we might well be 
satisfied even with the parabolic regression curve. The following are the regres- 
sion curves determined, y^, being the actual number of branches in the whorl 
(= 6*655, 375 +¥;,:), and x^, the actual position of the whorl : — 

(a.) Straight line . • 

y^^=7 -04:6,087 — -139,408 Xp. 

(b.) Parabola from (Ixv.) : 

i/,^=6-794,052--125,872a;^--077,592a;/; 
or, 

«/^^=6-858,561- -077,592(3;^- 1-991, 535)1 

This clearly gives a maximum number of branches, 6-8536 corresponding to 
a;j„=l-9915, a value within the limits of observation, 
(c.) Cubic from (lix.) : 

y^ =6-799,399 - -192,489 X^- -084,230 X/-\- -020,915 X/. 

Here Xp is measured from the mean position=ajp— 2*802, 651, and.y^^ is, as before, 
the total number of branches for the given position. 

Condition (Ivii.) is so closely satisfied that we shall here get sensibly as good 
results from (lix.) as from (Ivi.). 

In the table below and in the curves of Diagram I. the values of the mean of 
the arrays, as found from line, parabola, and cubic, are given and compared with 
observation. 

case been used, as this condition is not fulfilled. The axes x', y' actually taken for woodruff were those 
through the third whorl and through six branches. 

An obvious warning about the signs of the sums of the products may be given which may save 
computators some trouble. The axes being taken positive, as in the accompanying 
figure, then the sums of the products for IIii and Hgi are positive in the 1" and 
3'*, negative in the 2°* and 4'" quadrants. For 1121 and n^ they are positive 



4th 
■+y 



1st 

+ x 



in the 1" and 4"" quadrants and negative in the 2'"' and 3"* quadrants. In 2°^ 

the figure the axes are taken so as to suit the x and y-directions of the table on 

p. 31. Care must, of course, be paid to this point. The products may also 

be found from the «/»,'s in the manner indicated on p. 35, footnote. They were thus verified in this case. 
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Table II. — Mean Branches to each Whorl. 



Xp = 


0. 


1. 


2. 


3. 


4. 


5. 


6. 


yxf from line .... 

yxr „ parabola . . . 

yx, „ cubic .... 

Observed .... 




■7-046' 
■6-546' 
'6- 117" 




6-907 
6-777 
6-750 
6-780 


6-767 
6-854 
6-889 
6-813 


6-628 
6-775 
6-758 
6-813 


6-488 
6-541 
6-443 
6-486 


6-349 
6-151 
6-192 
6-172 


6-210 

5-607 

6-007 

1 



I think we may safely say that in the relationship of branches to position of the 
whorl in woodruff we have a case of homoscedastic correlation, which is effectively 
described by a parabolic regression curve. Thus, in a case of this kind, it is only 
needful, besides the moments up to the fourth of the x-character, to find the 
correlation coefficient r and the correlation ratio t/. 



(9.) Illustration B. — On the Correlation between Age and Head Height in Girls. 



The data for this are taken from my School Measurement series, and involve the 
auricular heights of 2272 girls between the ages of 3 and 22. There was considerable 
paucity of material at the extreme ends of the range, and accordingly as our correlation 
curves are all obtained by weighting the observations, we can hardly expect good fits 
near 3 or 22 years of age. The actual correlation table is given as Table III. 
Sheppaed's corrections were applied throughout, and the unit of height is 2 millims. 

In the first place the means, standard deviations, and 3'''^ moments of all the arrays 
of heights for different years of age were determined. These are given at the foot of 
Table III., but in actually calculating the constants more places of decimals were 
used. Then the first six moments of the frequency of the ages were found and the 
first four moments of the height frequencies. These are the x and ^/-frequencies. 
They give us : — 
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3-4. 






20-21. 


21-22. 


22-23. 


Totals. 






millims. 
















minima. 






102 -25-104 -25 


— 






— 


— 


— 


2 


102 -25-104 -25 






104 -25-106 -25 


— 


1 




— 


— 


— 


10 


104 -25-106 -25 






106 -25-108 -25 


— 


i 


— 


— 


— 


10 


106 -25-108 -25 






108 -25-110 -25 


— 






— 


— 


— 


27 


108 -25-110 -25 






110 -25-112 -25 


— 






— 


— 


— 


56 


110 -25-112 -25 






112 -25-114 -25 


— 


1 


— 


— 


— 


59 


112 -25-114 -25 






114-25-116-25 


1 






1 


— 


— 


115 


114 -25-116 -25 






116 -25-118 -25 


— 




— 


1 


— 


142 


116 -25-118 -25 






118 -25-120 -25 


— 






1 


— 


— 


244 


118 -25-120 -25 




1 


120 -23-122 -25 


— 






— 


3 


— 


265 


120 -25-122 -25 






122-25-124-25 


— 






2 


— 


1 


261 


122 -25-124 -25 


'g- 


e4-( 




















a- 


4 


124 -25-126 -25 
126 -25-128 -25 


— 






1 


1 


1 


265 
219 


124-25-126-25 
126 -25-128 -25 






128 -25-130 -25 


— 






1 


1 


— 


197 


128 -25-130 -25 






130 -25-132 -25 


— 






1 


1 


— 


131 


130 -25-132 -25 






132 -25-134 -25 


— 






— 


— 


— 


88 


132 -25-134 -25 






134 -25-136 -25 


— 






— 


— 


— 


77 


134 -25-136 -25 






136 -25-138 -25 


— 






— 


— 


— 


52 


136 -25-138 -25 






138 -25-140 -25 


— 






— 


— 


— 


20 


138 -25-140 -25 






140 -25-142 -25 


— 






— 


— 


— 


16 


140 -25-142 -25 






142 -25-144 -25 


— 






— 


— 


— 


11 


142 -25-144 -25 






144 -25-146 -25 


— 






— 


1 


— 


4 


144 -25-146 -25 






146 -25-148 -25 


— 




— 


— 


— 


— 


1 


146 -25-148 -26 




Totals 


1 




7 


8 


2 


2272 


Totals. 




Means 1 
in r 


115 -2500 


11 


r 


123 -8214 


126 -5000 


125 -2500 


124 -0467 


Means 
1 ^^ 




1-millim. units J 






} 










[_ 1-millim. units. 




Standard deviation 
in 







2 -5311 


4 -1414 


-9574 


3 -4541 


r Standard deviation 
1 "^ 




2-millim. units 
















L 2-millim. units. 




Third moments | 
in f- 





- 4 




- 2 -729 


+ 85-816 





+ 5 -206 


r Third moments 
1 ^° 




2-millim. units J 
















L 2-niilIim. units. 

























Table III. — Correlation between Age and Auricular 



w 



Totals . 



millims. 
102 -25-104 -25 

104 -25-106 -25 

106 -25-108 -25 

108 -25-110 -25 

110 -25-112 -25 

112 -25-114 -25 

114 -25-116 -25 

116 -25-118 -25 

118 -25-120 -25 

120 -25-122 -25 

122-25-124-25 

124 -25-126 -25 

126 -25-128 -25 

128 -25-130 -25 

130 -25-132 -25 

132 -25-134 -25 

134 -25-136 -25 

136 -25-138 -25 

138 -25-140 -25 

140 -25-142 -25 

142 -25-144 -25 

144. -25-146 -25 

146 -25-148 -25 



Means 

iu 

1-millim. units 



Standard deviation 

in 

2-millim. units 



Third moments "l 

™ . r 

2-milluu. umts J 



3-4. 



4-5. 



115 -2500 



116 -9643 



2 -8853 



- 42-822 



Age. 



5-6. 



18 



117 -4722 



2 -9276 



- 18-108 



6-7. 



7-8. 



40 



119 -1000 



2 -9641 



7-679 



1 

5 
1 
4 
7 
9 
13 
9 
7 
6 
9 

3 
1 
1 



76 



120 -3026 



2 -9882 



+ 1 -782 



8-9. 



2 

5 

3 

8 

7 

22 

19 

17 

19 

10 

6 

6 



125 



121 -6340 



2 -6366 



- 6 -171 



9-10. 



1 
1 
1 

12 
10 
15 
10 
24 
25 
23 
18 

8 

9 

5 

7 

3 

3 

1 
1 



177 



121 -7246 



3 3877 



+ 15-893 



10-11. 



4 

3 

8 

14 

23 

25 

29 

34 

33 

21 

17 

7 

8 

4 

2 

2 



235 



122 -8160 



2 -9653 



+ 2 -330 



11-12. 



2 

4 

2 

6 

6 

11 

15 

37 

34 

38 

29 

27 

16 

13 

10 

4 

2 

3 

2 



261 



123 -1427 



3 -2089 



+ -238 



12-13. 



2 

5 

9 

16 

18 

44 

41 

33 

40 

27 

20 

17 

13 

9 

10 

3 

1 



309 



123 -8908 



3 -2061 



+ 8 -219 



13-14. 



2 
2 
4 

3 

4 
10 
13 
23 
32 
21 
32 
32 
39 
17 

8 
11 

4 

2 

2 

2 



263 



124 -8622 



3 -3589 



- 7 -286 



14-15. 



1 

9 

3 

7 

9 

11 

21 

22 

23 

20 

25 

15 

5 

13 

5 

2 

4 

3 



198 



125 -7146 



3 -5865 



+ 3 -015 



icular Height of Head in Girls. 
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Totals. 
























4-15. 


15-16. 


16-17. 


17-18. 


18-19. 


19-20, 


20-21. 


21-22. 


22-23. 
































— 


2 


millims. 
102 -25-104 -25 




— 


— 


— 


— 


— 


— 


— 


— 


— 


10 


104 -25-106 -25 




— 


1 


— 


— 


— 


— 


— 


— 


— 


10 


106 -25-108 -25 




1 


3 


1 


— 


— 


1 


— 


— 


— 


27 


108 -25-110 -25 




9 


2 


4 




1 


— 


— 


— 


— 


56 


110 -25-112 -25 




3 


5 


5 


1 


— 


— 


— 


— 


— 


59 


112 -25-114 -25 




7 


6 


8 


2 


2 


— 


1 


— 


— 


115 


114 -25-116 -25 




9 


11 


6 


4 


3 


— 


— 


1 


— 


142 


116 -25-118 -25 




11 


19 


6 


G 


3 


2 


1 


— 


— 


244 


118 -25-120 -25 




21 


15 


13 


9 


4 


— 


— 


3 


— 


265 


120 -25-122 -25 


w 

crq* 

8. 

r 


22 
2.3 
20 


18 
26 
18 


25 
14 
16 


9 
12 
13 


4 

10 

9 


1 
1 


2 

1 


1 


1 

1 


261 
265 
219 


122 -25-124 -25 
124 -25-126 -25 
126 -25-128 -25 


25 


29 


16 


11 


7 


— 


1 


1 


— 


197 


128 -25-130 -25 




15 


18 


12 


6 


6 


4 


1 


1 


— 


131 


130 -25-132 -25 




5 


16 


7 


7 


6 


— 


— 


— 


— 


88 


132 -25-134 -25 




13 


9 


11 


8 


2 


1 


— 


— 


— 


77 


134 -25-136 -25 




5 


14 


6 


3 


2 


1 


— 


— 


— 


52 


136 -25-138 -25 




2 


2 


4 


2 


— 


— 


— 


— 


— 


20 


138 -25-140 -25 




i 


2 


2 


— 


1 


1 


— 


— 


— 


16 


140 -25-142 -25 




3 


— 


4 


— 


1 


— 


— 


— 


— 


11 


142 -25-144 -25 




— 


— 


2 


1 


— 


— 


— 


1 


— 


4 


144 -25-146 -25 




— 


— 


— 


— 


— 


1 


— 


— 


— 


1 


146 -25-148 -25 




198 


214 


162 


95 


61 


13 


7 


8 


2 


2272 


Totals. 


5 '7146 


126 -1565 


126 -5340 


126 -9132 


127 -0205 


129 -5577 


123 -8214 


126 -5000 


125 -2500 


124 -0467 


Means 

in 

1-millim. units. 


3 -5865 


3-4fi63 


3 '8696 


3 -1679 


3 -1235 


4-8406 


2 -5311 


4 -1414 


-9574 


3 -4541 


Standard deviation 

in 

2.milliin. units. 


3 015 


- 9 -615 


+ 9 -379 


+ 2 -991 


+ 0'070 


- 29-164 


- 2 -729 


+ 85-816 





+ 5 -206 


Third momenta 

in 
2-millini. units. 



SKEW COEEELATION AND NON-LINEAE REGRESSION. 



35 



Height Constants. 
Mean height = 124-0467 millims. 



a-j,= 


3-454,125 


Ma= 


11-930,977 


/*3 = 


5-206,247 


/^4 = 


438-639,633 


^\= 


-015,960, 


P\= 


3-081,454, 



m 

2 millim. 

units. 



Age Constants. 
Mean age = 12-7007 

o-^= 3-064,819 "" 
v^= 9-393,110 
j'3= 1-051,882 
v^= 239-157,055 



in 



r ysar 
units. 



Further 



Sm = 


2-093,366 millims. 


\= 


4-382,181 1 in 1 millim 


K= 


62-399,135j units. 


Hence 




(X,-3V)/(4X,^) = 


-062,340, 



1/6= 104-298,702 
V6 = 9536-265,059 
fii= -001,335, 

)82= 2-710,593, 
^83= -014,093, 

^4= 11-506,681, 
\/Wi=+ -036,538, 
<^2= 1-709,258, 
<^3= -250,123. 



<f>,-. 



4-158,032. 



In the next place the products were worked out and referred to the means with 
the following results : — * 



^11= 3-113,712, 
2>2i=~ 1-957,022, 
P3i= 74-447,616, 
j94i= -108-701,559, 



whence r= -294,128, 
e= — -071,065, 
^=-•048,576, 
^=-•470,126. 



Further, from 2m, t? = -303,024. 

In deducing the product-moments after they had been referred to the means, the 

* These products were in this case (as in all other cases) verified by calculating from the means of the 
arrays t/xp, the expressions 

s/%p?^!_fe^"l, gl w^y^pfe-j^) "!^ s|%p3M^j:^\ }gJ %,y«,fa-'«> \ 

Of course it is easiest to calculate these products about some arbitrary origin coinciding with the 
abscissa of one array. If these products be then p'u, p'21, p'31, p'n, and *' be the mean, we have 

Pii=/u, 

i'21 =p'ii - 2*>'ii, 

Psi =p'i\ - 3x'p'2i + Ss'Vii, 

Pa =p'ii - ^'p'zi + 6iB'y2i - 4iB'yii> ■ • • ' 
B 2 
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proper Sheppard's corrections were introduced. These are, if {pn], {p=ii]> \Pii}> 
\Pii\ represent the uncorrected moments : — 

Pn={Pn]> Pii=iPii]' 

Psi={Pii}-i{Pn}, Pii={Pii]-2{Pn]> 

the units of grouping being the units throughout. 
From the constants for the arrays, I found 

Xi-1 = --000,675, X3=-'007'198. 

Whence the probable error of vj was determined by (xxxiii.). Its value was* 

Probable error of 77= -012,913, 

If found from the simple formula '67449 (l-iy^VN, the value is -012,851. We 
accordingly are again forced to the conclusion that -q may for practical purposes be 
found from this simple formula, instead of the complicated result (xxxiii.). Although 
both Xi— 1 a.nd xs are small, it is very doubtful whether we can legitimately consider 
the system as homoscedastic. The dotted line ab of Diagram II. would fairly well 
represent increasing variability with age. The skewness of the arrays is relatively 
small and changes sign so frequently, that we can certainly not attribute any law to 
such heteroclitic tendencies as there are. They are probably due to errors of random 
sampling from truly isocurtic material. 

It will be seen that the height frequencies with ;S'i = '0160 and /8'3=3-0815 do not 
differ very much from a normal distribution ; in fact, we can lay no stress on the 
heteroclisy of the system at all. But the values of the standard deviations of the 
arrays, or the graph of (T„Ja-y, certainly shows increasing variation with increasing age, 
a phenomenon with which one is familiar in a variety of other human characters, t 

This heteroscedasticity, due to increasing variation with growth, would hardly have 
been anticipated from a mere inspection of the smaUness of xi \ it is somewhat 
obscured by the irregular values of the standard deviations of the small arrays at 
the adult end of the age range. The mean value of the standard deviation of the 
weighted arrays is a-y v/l— ■>7" = 3-2992 in 2-millim. units. 

We now turn to the regression curves to see how far the conditions for the 

different types are satisfied. We have 

^2_^3_ -005,312, 

<^2 (r?^-r2)-€^= -004,030, 

<^2('?'-^')-e^-(l«^3-e.^3)7(<^2«^4-«^3')=-000,604. 

* The contributions of the successive terms of (xxxiii.) are in fact given by 

V = i {-824,785 + -001,870 + -004,673 - -000,472 + -001,888 }. 

t See Pearson : ' The Chances of Death and other Studies of Evolution,' vol. I., pp. 296, 307, 
310, 314. 
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But the first should be zero, if the regression be hnear ; the second, if it be 
parabolic ; and the third, if it be cubical. 

We see increasing approximation to fulfilment of the several conditions. Referred 
to axes through the mean age and head height, the following are the regression 



curves 



* ._ 



(a.) Straight line: 

Y, =-662,979 Xp. 

(&.) Parabola (from equation (Ixv.)) : 

Y^,= -055,749+ -667,570 X^- -041,001 X/. 

(c.) Cubic (from equation (Ivi.)) : 

Y^,= -280,194+ -722,886 X^- -029,580 X/- -002,223 X/. 

(c'.) Cubic (from equation (lix.)) : 

Y^= -296,076 + -812,249 X^- -028,004 X/- -005,740 X^^ 

(c') will not give as good results as (c), for it depends on a use of the condition 
(Ivii.) which is not absolutely fulfilled. 

The following table gives the values in the case of the four curves : — 

Table IV. — ?/^_=Mean Auricular Height of Girl's Head at Given Age. 



a;j, = age. 


Regression line. 


Regression 
parabola.t 


Cubic (c). 


Cubic (c'). 


Observed. 


3-5 


117-95 


114-49 


116-90 


118-94 


115-25 


4-5 


118 


61 


115 


87 


117-66 


118-94 


116-96 


5-5 


119 


27 


117 


17 


118-42 


119-16 


117-47 


6-5 


119 


94 


118 


39 


119-24 


119-57 


119-10 


7-5 


120 


60 


119 


52 


120-08 


120-14 


120-30 


8-5 


121 


26 


120 


57 


120-93 


120-84 


121-63 


9-5 


121 


92 


121 


55 


121-78 


121-62 


121-72 


10-5 


122 


59 


122 


43 


122-62 


122-45 


122-82 


11-5 


123 


25 


123 


24 


123-42 


123-26 


123-14 


12-5 


123 


91 


123 


97 


124-18 


124-15 


123-89 


13-5 


124 


58 


124 


61 


124-88 


124-95 


124-86 


U-5 


125 


24 


125 


17 


125-52 


125-65 


125-71 


15-5 


125 


90 


125 


65 


126-07 


126-22 


126-16 


16-5 


126 


57 


126 


05 


126-52 


126-68 


126-53 


17-5 


127 


23 


126 


36 


126-87 


126-93 


126-91 


18-5 


127 


89 


126 


59 


127-09 


126-96 


127-02 


19-5 


128 


55 


126 


75 


127-18 


126-74 


129-56 


20-5 


129 


22 


126 


81 


127-11 


126-22 


123-82 


21-5 


129 


88 


126 


80 


126-88 


125-38 


126-50 


22-5 


130-54 


126-71 


126-48 


124-28 


125-25 



* Y-ep is here measured in millimetres and Xj, in years. 

t The maximum ordinate is at vertex of parabola, i.e., a; = 8-1409, or age 20-84; its magnitude = 126-82. 
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An examination of this table and the graphs on Diagram II. seem to show : — 

(i.) That cubic (c) is considerably better than cubic (c'). 

(ii.) That we do get a sensible betterment in passing from parabola to cubic, and, 
accordingly, that we must use in this the cubic to effectively describe the regression 
within the range of observation. Probably neither cubic nor parabola would effectively 
serve for extrapolation even close to the limits of observation. 

Thus the cubic (c') starting at 3-4 with its point of inflection is clearly 
inadmissible, and the drop after 20 or 21 years of age, shown by both parabola and 
cubic, is, of course, only due to the anomalous character of the few girls over 18 left 
in the schools. Actually the shrinkage of measurements does not begin till at least 
26 years, and is then far more gradual than these curves indicate. 

But, as in all fitting of this kind, we obtain the best fit we can within the range, 
entirely at the expense of what may occur just outside the range. For this reason, 
as E. Peerin* has pointed out, a good interpolation curve is usually a bad extra- 
polation curve. 

We might sum up our results for auricular height with age in girls by saying : 
That the correlation is non-linear, effectively cubic ; heteroscedastic, there being 
increasing variability with growth ; that while the total height frequency is not very 
far from normal the array frequencies are slightly heteroclitic, but so very irregular in 
sign, that probably we are dealing with a case of isocurtic homoclisy, to which the 
sparsity of data in the extreme arrays gives an appearance of anomic heteroclisy. 

(10.) Illustration C. — On the Skew Correlation between Size of Cell and Size of Body 

in Daphnia magna. 

Dr. E. Warren has dealt with this point in a memoir published in ' Biometrika,' 
vol. II., pp. 2.55-9. The resulting regression curve of size of cell for given size of 
body is very far from linear, and it is quite clear that the correlation is skew. It 
has already been noted in ' Biometrika ' that the relationship is considerably obscured 
by the irregularities produced by ecdysis. Our object at present, however, is purely 
theoretical, namely, to show how a certain system of constants and of curves describes 
the actual correlationship, and for this purpose Dr. Warren's observations form as 
good material for graduation as we could expect to find. The following Table V. 
gives the observations with the working scales attached. I must refer to 
Dr. Warren's paper (p. 256) for the relation between the units of grouping on the 
working scales and those of the actual measurements on body and cell lengths. As 
far as correcting the raw moments is concerned, Sheppard's corrections were used 
for the cell sizes, but not for the body lengths, because the number of individuals in 
the latter case was perfectly arbitrary and there is no approach to high contact. The 

* ' Biometrika,' vol. Ill,, p. 99. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 



39 



product moments were also uncorrected. The product moments were found in both 
ways (see p. 35, footnote) and the results thus verified. 

Table V. gives the means, standard deviations, and third moments of the arrays ; 
the latter are all small and superficially irregular in sign. I think we may say that 
there is no marked and continuous heteroclisy. On the other hand, I think we may 
say that while the clitic curve deviates to and fro from a zero base, the scedastic 
curve would fit better to a parabolic curve than to the straight line which is its 
mean. In other words, the variability of the cells increases with size of body {i.e., 
growth) up to a certain stage and then decreases again. This result is obscured by 
the fall of the variability after each ecdysis. Roughly the ecdyses produce a rhythm 
in all three curves, the regression curve, the scedastic curve, and the clitic curve. 
When the means of the arrays are above the regression cubic, then the ordinates of 
the scedastic curve are above their mean and those of the clitic curve show positive 
skewness ; when they are below the regression curve, we have lessened variability 
and negative skewness. In other words, the ecdyses are accompanied by lessened 
cell variability and negative skewness of distribution. I think we may state that 
there is a nomic heteroscedasticity due to growth of body, giving first an increased 
variability with growth and afterwards a decrease with age. There is probably 
isocurtic homoclisy. Both of these are, however, obscured by a semi-rhythmic 
heteroscedasticity and heteroclisy introduced by the ecdyses. 

We now turn to the constants of the cell and body length distributions, merely 
noting that all these constants are given in terms of the units of the working scales. 

Body Length Constants. 



Further 



Cell Constants. 




Mean cell= 


9-268,657, 


a-y= 


2-541,734, 


/*2 = 


6-460,410, 


H= 


2-142,362, 


/*4 = 


123-921,496, 


)8i'= 


•017,021, 


)8.'= 


2-969,111. 


Sh = 


1-454,600, 


K= 


2-115,862, 


K= 


15-142,840. 


-3X/)/(4X/)= 


-095,615. 



Dgth = 


8-502,488, 


(Tj: = 


3-864,784, 


v% = 


14-936,562, 


Vz = 


- 5-125,806, 


Vi = 


432-769,533, 


"6 = 


- 425-276,682, 


"6 = 


15192-5375, 


A = 


•007,885, 


A = 


1-939,793, 


)83 = 


•043,796, 


i84= 


4-559,091, 


v//8i= 


- -088,798, 


4>,= 


•931,908, 


^3 = 


- -232,167, 


4,,= 


-788,409. 
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We have next the product moments referred to the means 

Pn= 3-892,863, whence r= -394,862, 

^21=- 12-104,322, e=- -281,831, 

Psi= 127-348,064, C= '098,578, 

^^i=_ 541 -433,455, ^=--759,344, 

Further, from t^, 

ri = -572,287. 

From the constants for the arrays I deduced 

Xi-1 = --108,148, xa= '088,323. 

These are higher values of Xi~l ^^^ Xa than we have found in the first two 
illustrations. 

We now obtain, showing the contribution of each term of (xxxiii.), 

2,2=^{-452,240--002,528+-010,803--013,180--027,875}. 

Whence probable error of 7? = -67449 S,= "0097. 

Had we calculated the probable error of rj from (xxxiv.), we should have found it 
equal to -0101. The difference is greater than in the two previous illustrations, but 
is only -0004, and this would have no significance in any practical use of the probable 
error. We again conclude, therefore, that (xxxiv.) is sufiiciently close to replace 
(xxxiii.) in practice. 

For the mean standard deviation of the weighted arrays we have 



0-^=0-^^1 —ri^=2-084:,358. 
If we now examine the criteria for the nature of the regression, we have 

iy2_r2= -171,596, 
<^2(Tj2_r2)-e2= -080,483, 

<l>,{v'-^)-i'-{U,-'^<l>sm<f>^i>^-h')=-079A57. 

We should conclude, therefore, that linear regression is inadmissible, but that 
parabolic or cubic will be moderately successful, the latter not very much better than 
the former. Our moderate success only in this case is, of course, due to the irregu- 
larity of the results to be graduated, the influence of the ecdyses being so disturbing 
that we really need a curve periodically varying from the graduated regression curve. 

We have the following regression curves : — 

(a.) Straight line: 

Y^,=: -259,687 X^. 
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(b.) Parabola from (Ixv.) : 

Y,.,=l-097,690 + -236,135X^- -073,490 X/. 

The maximum occurs when X;; = r6066, and is given by Y;,^=l-2874, thus occurring 
within the limits of observation.* 
(c.) Cubic from (lix.) : 

Y,, = -752,856 + -193,058X^- -049,817 X/+ -001,710 X/. 

In all these cases Y^^ and Xp are measured from the means of the cell and body- 
lengths, or from 9-268,057 and 8-o02,488 respectively. 

Table VI. gives the calculated and observed results, and the whole system is 
represented in Diagram III. Either the parabola or cubic graduates quite well the 
results, allowing for the periodic deviation, and we may fairly describe the system as 
a heteroscedastic cubic regression with isocurtic homoclisy. The correlation ratio is 
very sensibly different from the correlation coefficient. The regression cubic does not 
differ widely from that given in ' Biometrika,' which was obtained without weighting 
the means of the arrays, and by simply striking the best cubic of the given type 
through the points. 

Table VI. — 2/»^=Mean Cell Length for Given Body Length in Daphnia. 



a;p = body length. 


Regression line. 


Regression parabola. 


Regression cubic. 


Observed. 


1 


7-320 


4-458 


5-047 


5-300 


2 


7-580 


5-724 


6-190 


5-833 


3 


7-840 


6-842 


7-166 


7-790 


4 


8-099 


7-813 


7-986 


8-050 


5 


8-359 


8-638 


8-661 


9-473 


6 


8-619 


9-315 


9-200 


8-436 


7 


8-879 


9-846 


9-613 


8-596 


8 


9-138 


10-229 


9-912 


10-267 


9 


9-398 


10-466 


10-105 


10-761 


10 


9-658 


10-555 


10-205 


11-027 


11 


9-917 


10-498 


10-220 


10-953 


12 


10-177 


10-293 


10-161 


9-100 


13 


10-437 


9-942 


10-038 


9-000 


U 


10-696 


9-443 


9-861 


10-036 


15 


10-956 


8-798 


9-642 


10-317 



(11.) Illustration T). — On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Stem in Equisetum arvense. 

I have selected this example not on account of any biological importance, because 
the material is — especially with regard to the first and last two whorls — unsatisfactory 
either on account of irregularity or of insufficiency of material. It has been taken 



Actual values on working scales, a;, = 10-1091 and yai,= 10-5560. 
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purely from its statistical interest, because it gives a series with markedly skew 
correlation, having a regression curve of a rough S -shaped character. If we omit 
the first and last whorls, we get, as I have already shown,* a remarkably close fit 
with a cubical regression curve. My present object, however, is not to consider any 
law of growth, but merely a mass of statistical material, to be dealt with by the 
processes of the present paper. 

We may anticipate that the irregularities of the series, indicated in the memoir 
just referred to, will make themselves manifest in a less satisfactory fitting of the 
regression curve than occurs when we deal with the more homogeneous group oi 
equally weighted whorls fitted in the diagram of that paper. Table VII. gives the 
data, with the means, standard deviations, and third moments of each array. 

The axis of x shall be taken to give the position of the whorl on the stem and that 
of y to denote the number of branches. We require the regression curve of y on x, 
or the probable number of branches on a whorl in a given position. We shall not 
use Sheppard's corrections for the moments of either the x or ^/-characters, as high 
contact certainly does not hold for both at the low- value ends of their ranges. 

We have the following constants : — 



Position Constants. 


Branch Constants. 


Mean position = 


6-403,315, 


Mean number of branches = 


7-216,851, 


<T^ = 


3-542,604, 




0-y = 


3-278,499, 


"2 = 


12-550,046, 




/*2= 


10-748,557, 


"3 = 


8-249,534, 




/*3=- 


- 24-313,478, 


»'4= 


319-515,824, 




H= 


245-811,660, 


"6 = 


644-095,176, 








1/6=11203-5814, 








A = 


•034,429, 




^\ = 


•476,044, 


A= 


2-028,625, 




i8'.= 


2^127,658. 


A= 


-214,190, 


Further 






^.= 


5-667,884, 




Sm = 


2-789,949, 


v//8x = 


-185,550, 




\= 


7-783,815, 


^.= 


-994,196, 




h= 


140-441,685. 


•^3 = 


•592,384, 


Hence 






«^4= 


1'518,136. 


(\,-3V)/(4V)= 


-•170,503. 



We have next the product moments referred to the means 

* 'Proc. Roy. Soc.,' vol. 71, p. 308. 
r 2 
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Pn= - 8-225,585, whence r= -708,222, 

i)2i= - 21-471,321, e= --390,436, 

2)31= -205-084,042, 4= +-029,733, 

p^^= -917-984,938, 6= --960,212. 

Further, from Sm, 

7; = -850,984. 

From the constants for the arrays we deduce 

Xj-1= --356,367, )(2= --312,952. 

We now ohtain, showing the contribution of each term of (xxxiii. ), 

V=i{'076,080--157,932-|--055,359 + -079,662 + -038,579}. 

Whence probable error of 7^= -67449 ^,= '0054. 

Had we calculated the probable error ol r) from (xxxiv.) we should have found it 
equal to -0049. The difference -0005 is not of importance for practical purposes. 
Yet in this case it is clear that the values of ^^j — 1 and Xi ^^^ very sensible. Thus we 
see that a very marked heteroscedastic and heteroclitic system with continuously 
changing standard deviation and skewness scarcely affects for practical purposes 
(i.e., to three significant figures) the probable error of 77, All four of our illustrations 
therefore confirm the conclusion that : 

For practical purposes the probable error of the correlation ratio, rj, may be taken 
as -67449(l-'»,2)/N.(7) f^ 

Our Diagram IV. gives the values of the relative standard deviations of the arrays, 
or, (r„Ja-y, the horizontal line giving v^l— )7^=-5252, or the mean value of the relative 
standard deviations of the weighted arrays. We have also the clitic curve giving 
^\/Pi, for each array,* The remarkable smoothness of these scedastic and clitic curves 
in this case indicates how far certain types of correlation surfaces diverge fi:"om pure 
normality of distribution, the divergence being obviously nomic. 

We now turn to the regression curves and write down the conditions for the 

different types; the three expressions should be zero for linear, parabolic, and 

cubical regression respectively 

^3_^3_. 222,596, 

</»2 {'q^-r^)-^= -068,864, 

«^3 (V^-^^)-^-(I«^2-^>s)V(«^3«^*-«^3')= -010,127. 

* i JPi = difiference between mode and mean divided by standard deviation = skewness in the case of 
skew-curves of Type III. (' Phil. Trans.,' A, vol. 186, p. 373), and may be taken as a reasonable measure of 
the skewness for those cases in which the fuller form involving ^2 would involve too laborious calculations. 
If in equation (xii.) of the present memoir we put ^82 = 3 + a small quantity, and remember that ySj is itself 
a small quantity, we see that the more correct formula for the skewness involving fi^ reduces, neglecting 
terms of 2"'' order, to | ^fp[. 
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We see at once that the straight line is inadmissible, the parabola will not be very 
good, and the cubic only moderately appropriate. The conditions are not nearly so 
closely fulfilled as in the cases of woodrufi" and head heights ; the last two are better 
than in the case of Daphnia cells, but while the deviations in the case of Daphnia 
were irregular, there being no approximate smoothness in the scedastic or clitic 
curves, we shall find here more uniform deviations which would probably be partially 
allowed for by a quartic regression curve. 

The following are the regression curves : — 

(a.) Straight line: 

Y;,^=- -655,423 X^. 

(b.) Parabola from (Ixv.) : 

Y:,^=l-551,307--574,17lX^--123,610X/. 

The maximum ordinate is at the position Xj„=— 2-3225, or a3p=4-0808, with 
maximum number of branches yp= 9-435. 
(c.) Cubic from (Ivi.) : 

Y^,= 1-590,413--987,694X;,--]37,641X/+-016,605X^3 

In all cases X^, and Y^^ are measured from the mean position and the mean number 
of branches, i.e., 6-403,315 and 7-216,851 respectively. 

The following table contains the calculated and observed results : — 

Table VIII. — Mean Number of Branches to each Whorl in Equisetum. 



Position. 


Regression line. 


Regression 
parabola. 


Regression 
cubic. 


Observed. 


Regression cubic 
without first whorl. 


1 


10-758 


8-262 


7-506 


7-619 


[8-207] 


2 


10-103 


8 


900 


9-070 


9-294 


8-929 


3 


9-447 


9 


291 


9-920 


9-627 


9-869 


4 


8-792 


9 


434 


10-156 


9-730 


10-161 


5 


8-137 


9 


330 


9-876 


9-643 


9-911 


6 


7-481 


8 


980 


9-182 


9-427 


9-224 


7 


6-826 


8 


382 


8-172 


8-732 


8-205 


8 


6-170 


7 


536 


6-947 


7-297 


6-962 


9 


5-515 


6 


444 


5-605 


5-555 


5-599 


10 


4-859 


5 


104 


4-247 


3-964 


4-223 


11 


4-204 


3 


517 


2-971 


2-443 


2-939 


12 


3-549 


1 


683 


1-879 


1-866 


1-854 


13 


2-893 


-0 


399 


1-069 


1-462 


1-072 


14 


2-238 


-2 


727 


0-641 


1-333 


0-700 


15 


1-582 


-5 


303 


0-694 


1-250 


0-844 


16 


0-927 


-8-126 


1-328 


1-000 


1-610 



In the last column I have placed the results of re-working the whole system, 
omitting the first whorl as largely influenced by the ground condition at the foot of 
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the stem.* The improvement of fit is not sufficiently great to justify a publication of 
all the constants for the distribution in this modified case. But there is improvement 
for the higher whorls, which are so few in number as to be wholly insignificant when 
compared with the weight of the first few low whorls. 

It wUl be noticed at once that the line and the parabola (which gives at the top of 
the stem negative numbers !) are absolutely unsuitable for representing the facts of 
the case. The cubic is better and certainly gives the general trend of the observa- 
tions, but in this our last illustration we have clearly reached the limit of material to 
which such cubical regression can be satisfactorily applied. See Diagram V. 

(12.) Quartic Regression. 

It seemed of some interest in this case of Equisetum to ascertain whether any real 
improvement in description would be reached by considering the quartic regression 
curve. I briefly indicate the theory in this case as developed from the general 
method in the footnote, p. 25. We shall now have 

Y J(r,=6o+&] (XV«r.)+63 (X^/o-.)H&3 (X,Mr +&* (XA.)*. 

Eliminating h^ and hi, by the processes familiar to us from the case of cubical 
regression, we have 

+fe3{(X,/cr.)3-^,(XVcr,)-v/A} 

+ &J(XVc7.r-(^3/V^)(Vcr^)-^2} (Ixx.). 

Hence as before 

^=63^2+63^3+6^(^5" 

l=h^i+h4>i+\'i>6 > (Ixxi.), 

where c^jj <^3> ^.nd ^^ are given as before by (li. and liv.), while 

<^5=^4-^3-^2 (Ixxii.), 

i>MPB-l3,fis-Mi)/\^i (Ixxiii.), 

^MMe-fi^'-^M/^i (Ixxiv.), 

and 

^h=VlvJ(T^^\ ^^=vjcrj (Ixxv.). 

Solving, we have 

5 — H4>2^i—^i) — ^(^4*^5 — <^3<^6) — C(<^2^6 — 4>?.^h) (Ixxvi.) 



« < 



Koy, Soc. Proc.,' vol. 71, pp. 308-310. 
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and 






V. (Ixxvii). 



Substituting in (Ixx.), the solution is completed. The advantage of this form is that 
we see clearly the modifications made in 63 and 63 as we pass from cubical to quartic 
regression. On the other hand, ^g and t^^, as shown by (Ixxv.), involve the 7'" and 
8"' moments of the «-character. These are not only very laborious to calculate, but, 
as we have already shown, are as a rule very untrustworthy. 

If we proceed as on p. 26, equation (Ivii.), we find 

7,2-r2=&3i+63^+fe/ (Ixxviii.). 

Using this and not the third equation of (Ixxi.), we replace (Ixxvi.) by 

6^ = ((^2<^^-(^32)^ 1 <A2 MMi-j>^^)\ . (Ixxix.). 

This equation for 64 only involves the 7"' and not the 8"" moment, but like the 
corresponding form (Ix. ) suffers from being a ratio of small quantities. (Ixxvii.) 
completes the solution as before. 

(Ixxvii.) and (Ixxix.) in conjunction give us a necessary condition for quartic 
regression. We can indeed now write the whole series of conditions as follows : — 

Linear regression : 

Parabolic regression : 
Cubical regression : 

^•3_,.2_^Y<^^_(^^^_^^^)7l^^(^^^^_^^2).^o. 

Quartic regression : 

^2 (Mi — 4>i) (Mi — ^3^}{Mi^7 — M'a" — Mb' — Me" + '^■6Me) 

(Ixxx.). 



We now have a third possibility : we can get rid of the fourth product moment d 
from the value of h^ and write it : 

, _ ^ A / v'-r'-ey<k2-a4>.-e<f>,f/{UM,-<l>^)] 
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While this value of 64 does not suffer like (Ixxix.) from being the ratio of small 
quantities, and would a priori appear to save the calculation of 6, yet the right sign of 
the root may not be ovious on inspection, so that an actual determination of 6 to find 
the sign of h^ may after all be needful. If (Ixxx.) were absolutely satisfied, (Ixxxi.), 
(Ixxix.) and (Ixxvi.) would lead to identical results; but this will rarely be true in 
practice. In any of the three cases \ and 63 will be given by (Ixxviii.). On the 
whole, I consider that (Ixxxi.) and (Ixxvi.) will give the better results, and probably 
the former the best, but it will generally require as much arithmetic as the latter. 

(13). Illustration E. — Calculation of the Quartic Regression Curve in the Case 

of Equisetum arvense. 

The only new constants required are : 

1/7=43,207-386, whence ^85 = 1-144,882, 

vg = 507,649'540, ^86=20-463,633, 

and : 

<^5=3-425,069, <^s= 3-452,046, 

<^7 = 15-015792. 
These lead us to : 

<^A-<^3<^6 ^ 2-723,384, M^-<l>s'h = 1-211,194, 

9i9i—'Pa 9294—93 



A,= 



<^2. <^3» ^6 



= 1-745,622. 



Our successive conditions are therefore : 

^2_^_. 222,596, 

^2_ra-6V<^2= -069,266, 

^a_r2-eV<^3-(C.^,-e<^3)7] <^2 («^2<^4- «^3')} = -010,186, 

r,^-r^-^/<l>,-{U,-i<l>,)y{MU*-<f>z')} 

_ f ^(«^2<^4— '^S^) — e('^4<^5 — ^I^Sa) — r(^2<^6 — '^3«^s) } ^ _ .Any OAA 

(^A-<^3^)A, 

whence we see the successive approximations to the fulfilment of the conditions. 

Clearly great gains arise when we pass from linear to parabolic, and from parabolic 

to cubic regression, but the advance is not so conspicuous when we pass to quartic 

regression. 

G 
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We have : — 

From (Ixxvi.) : 6^=-044,517, and 6^= --648,122, 63=-171,260, 
From (Ixxix.) : &^=-151,842, and 63= --940,410, &3=-041,981, 
From (Ixxxi.) : 6^=-025,999, and ^2= --597,691, 63 = -193,688. 

The equations to the three corresponding quartics are : 

(a). Y^^=l-724,611- -913.208 X^--169,311 X/+ -012,629 Xp3+-000,927 Xp\ 
(b). ¥^,=2-047,717- -734,966 Xj„--245,667 X/4- -003,096 X^^^. .003,161 X^* 
(c). ¥:,,= 1-668,788 --944,192 Xp--156,137 X^H '014,283 X/+-000,541 X/. 

The values of Y^^ and Xp are as before measured from the means, or 7-216,851 and 
6-403,315 respectively. 

The values of the observed and calculated ordinates are given in Table IX., and 
the graph of the results in the lower half of Diagram V. 



Table IX. — Mean Number of Branches to Whorl in Equisetum deduced from Quartic 

Regression. 



Position. 


Quartic (a). 


Quartic (b). 


Quartic (c). 


Observed. 


1 


7-731 


8-269 


7-637 


7.619 


2 


8-950 


8-662 


9-000 


9-294 


3 


9-715 


9-222 


9-800 


9-627 


4 


10-014 


9-674 


10-073 


9-730 


6 


9-858 


9-816 


9-866 


9-643 


6 


9-281 


9-521 


9-240 


9-427 


7 


8-339 


8-740 


8-270 


8-732 


8 


7-109 


7-498 


7-042 


7-297 


9 


5-692 


5-898 


5-656 


5-555 


10 


4-209 


4-116 


4-225 


3-964 


11 


2-816 


2-407 


2-875 


2-443 


12 


1-651 


1-100 


1-745 


1-866 


13 


0-930 


0-600 


0-987 


1-462 


14 


0-857 


1-389 


0-766 


1-333 


15 


1-665 


4-022 


1-259 


1-250 


16 


3-609 


9-133 


2-657 


1-000 



From these results we deduce the following conclusions : — 

(i.) That the use of a quartic instead of a cubic regression curve has not very 
markedly bettered the fit. The failure to get a closer fit lies largely in the nature of 
the material. The number of plants with more than 13 whorls is very few, and their 
contribution allows little weight to the tail of the regression curve. Further, all our 
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attempts to fit a smooth regression curve show that the observed data are unduly 
flattened at the top. If we confine ourselves to a homogeneous series of 110 plants 
with ten whorls apiece, we get a remarkably good fit.* The S-shape of the 
regression line as indicated in both cubic and quartic does, however, appear to be 
characteristic of the nature of the plant, and I take it that more ample material 
would allow of a closer analytical description by a simple cubic. I doubt whether for 
practical statistics the use of the quartic will often be requisite. 

(ii.) The comparative failure of the quartic (b) shows us that a formula like (Ixxix.) 
is of small service. This corresponds fully to our experience in the use of (Ix. ) in the 
case of the cubic. In both cases we get rid of a high moment by making a certain 
constant the ratio of two small quantities, and experience shows us that the result is 
unsatisfactory. It is accordingly preferable to use formulae involving high moments 
of one variable in preference to those with a ratio of small quantities. 

(iii.) The quartic (c) appears as good, if not slightly better, than quartic (a). In 
(c) we have got rid of a high product moment, 6, by supposing the quartic condition 
(Ixxx.) rigidly fulfilled. This of course is not the case. It is clear that product 
moments like of the 5* order are far from advantageous, and this is the same principle 
which was in evidence when we found (Ixv.) giving better results than (Ixiv.) for 
parabolic regression. Hence we must further conclude that the use of third, fourth or 
fifth product moments is disadvantageous as compared respectively with fifth to eighth 
moments of one variable. Or, a moment two degrees higher is preferable to a product 
moment in calculating correlation values. This is, I think, consonant with our 
knowledge of the relative magnitude of the probable errors in the two cases. 

(14.) General Conclusions. 

(i.) The present paper provides us with a general method of dealing with the 
regression line and the variability of arrays in the case of skew correlation, without 
any assumption as to the analytical form of the skew correlation surface. 

(ii.) It provides a nomenclature and classification of the types of array variability 
which may be of service. 

Arrays are either homoclitic or heteroclific, according as their skewnesses are of 
equal magnitude or not. Arrays are further homoscedastic or heteroscedastic, 
according as their standard deviations are alike ot different. Skew arrays are termed 
allocurtic; if arrays are symmetrical about their mean, they are isocurtic. 

A heteroclitic system of arrays may be nomic or anomic, according as the skewness 
of the arrays changes continuously or irregularly with the position of the array. 

A heteroscedastic system of arrays is also either nomic or anomic, according as the 
standard deviation of the arrays changes continuously or irregularly with the 

♦ 'Boy. Soc. Proc.,' vol. 71, p. 308. 
G 2 
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position of the arrays. Anomic heteroclisy and anomic heteroscedasticity probably only 
signify that our material is either heterogeneous or too sparse to free us from the 
large errors of random sampling in the extreme arrays. Still the terms will be 
found of use in describing the actual data. 

The curve in which the skewness of the array is plotted to its position is termed 
the clitic curve ; the curve in which the ratio of the standard deviation of the array 
to the standard deviation of the character in the population at large is plotted to 
position is termed a scedastic curve. 

(iii.) The types of regression have been classified into linear, parabolic, cubic and 
quartic. For most practical purposes the first three suffice. Necessary criteria 
have been given for each case. But as in the case of the skew frequency of one 
character, an indefinite number of conditions ought theoretically to be fulfilled. 
Practically in dealing with frequency, no criteria are absolutely fulfilled, and the 
probable errors of the expressions used become unmanageable as Ave ascend in the 
scale. We must therefore be content to estimate the degree of approximation with 
which one or two necessary criteria are satisfied. 

The fundamental test of deviation from the familiar form of linear regression is the 
inequality of the correlation coefficient r and the newly introduced correlation 
ratio 7;. The probable error of this latter is determined. It is shown that 
o-y v/l — 7j^ is the mean standard deviation of a system of arrays in skew correlation. 
The ease with which t; can be calculated suggests that in many cases it should 
accompany, if not replace the determination of the correlation coefficient. 

In the determination of the constants of the regression curve we must use 
moments and product moments. The limitations to the order of the curve used 
depend : (a) on the labour of the arithmetic, (b) on the increasing probable errors of 
the higher moments and product moments. For these reasons it seems idle to propose 
going beyond the 6"^ to 8"" moments, or the S'* to 5* product-moments. Practical 
experience suggests that little is to be gained by using moments beyond the S"*, or 
product moments beyond the 3'*. A quartic regression curve may be useful 
occasionally, but it has yet to justify its necessity. As our object is not to repro- 
duce the given data, but to provide a graduation for them, which smooths down the 
errors of random sampling, we believe that any legitimate and practical theory must 
discard the high moments and high product moments with which Thiele and LiPPS 
propose to deal. 

(iv.) There is one point to which reference ought to be made. Some reader may 
enquire why the method of my paper on curving fitting* should not be applied 
to these regression curves in general, as we have in practice once or twice 
already applied it. It would seem that that method is the easier, involving in the 
case of the quartic only quantities analogous to our r, e, C and 0. The answer is 

* " On the Systematic Fittings of Curves to Observations a d Measurements." ' Biometrika,' 
vol. I., pp. 265-303, and vol. H., pp. 1-23, especially the latter, pp. 11-15. 
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straightforward : that process supposes every y^^ to have equal weight, or n^^ to be 
the same for each array. Hence the higher moments of the a:-character, which are 
really involved, can be written down without calculation once and for all.* The 
complexity of our present investigation arises from the introduction of the weighting 
into the calculation of the moments of the a;-character, as well as into that of the 
product moments r, e, ^, 6. Our results therefore, although they might not look so 
good on a graph of the regression curve, would be markedly better, if due weight 
were given to the frequency of each array. The difference of the two conceptions is 
comparable to the determination of the regression on the one hand from the 
correlation coefficient, and on the other from merely striking a line through the 
plotted means of the arrays. The method of moments in the present case, if we 
except the use of -q, is identical with that of fitting a curve to a continuum in space 
by the method of least squares. 

(v.) No stress whatever is laid on the actual instances here selected for illustration of 
the methods of this paper. I have merely chosen out of available material cases in 
which I had come across skew regression of various types. Thus we find : — 

(a.) The correlation of the number of branches and position of the whorl in 
Asperula odorata is practically parabolic, homoscedastic and of nomic heteroclisy. 

(6.) The correlation between auricular height of head and age in girls is cubical, 
of nomic heteroscedasticity and of anomic heteroclisy. It is probably really a case 
of isocurtosis. 

(c.) The correlation of size of cell and size of body in Daphnia magna, allowing 
for the irregularities produced by the ecdyses, is parabolic or cubic, of nomic 
heteroscedasticity, and probably, but for the above-mentioned irregularities, of 
isocurtic homoclisy. 

{d.) The correlation of the number of branches and position of the whorl in 
Equisetum arvense is cubical or possibly even quartic, of markedly nomic hetero- 
scedasticity and markedly nomic heteroclisy. 

It is not impossible that slips have occurred in the lengthy arithmetic involved, but 
every important piece of work has been done independently twice, once by Dr. Alice 
Lee, whom I have most heartily to thank for her unwearying assistance, and once 
by myself. To preserve uniformity of working, the constants have in each case 
been carried to six figures. This involves little or no additional trouble, using as we 
do mechanical calculators. The final results are of course of no value beyond their 
probable errors, which will be in the second or third place of figures. No doubt I 
shall be told that there is a show of accuracy in the number of decimal figures 
retained, which does not really exist. It does not exist (and I am as fully conscious 
of its non-existance as any would-be critic) so far as our results fit the actual 
population, of which we have but a random sample. The figures, however, are of 
importance, as far as testing accuracy of fit of result to actual sample goes. The 

♦ 'Biometrika,' vol. II., p. 12. 
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cubic or quartic curves may have coefficients insensible before the third or fourth 
figure of decimals, and these coefficients have to be multiplied occasionally by 
abscissae of the third or fourth powers of 7 to 9. Hence to get ordinates true, as 
far as the sample goes, to the second or third figure, we require to work to a fairly 
high number of figures. There is no magic in six figures, four or five would probably 
satisfy another worker, but they are easily read ofi" the calculator we use, and if the 
constants had been tabled only to four or five, no reader would have been able to 
agree exactly, if he wished to test any of our results, even to three figures, with the 
final ordinates. 



DIAGRAM I. SKEW CORRELATION IN ASPERULA ODORATA. 
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DIAGRAM III. SKEW CORRELATION BETWEEN SIZES OF CELL AND BODY IN DAPHNIA. 
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DIAGRAM IV- SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL IN EQUISETUM: 

SCEDASTIC AND CLITIC CURVES 
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DIAGRAM V. SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL tN EQUISETUM : 

REGRESSION CURVES. 
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